Add metadata and link to paper

Hi! I'm Niels from the Hugging Face community team. I'm opening this PR to enhance your model card with standard metadata:
- Added `pipeline_tag: image-to-image` to ensure the model appears in the correct category on the Hub.
- Added `library_name: diffusers` as the configuration indicates compatibility with the diffusers ecosystem.
- Linked the model to its [Hugging Face paper page](https://huggingface.co/papers/2603.13089).

This metadata helps researchers find and use your work more easily!

Files changed (1) hide show

README.md +21 -6

README.md CHANGED Viewed

@@ -1,24 +1,29 @@
 ---
 license: apache-2.0
 ---
 <p align="center">
-   📄 <a href="https://arxiv.org/pdf/2603.13089" target="_blank">Paper</a> &nbsp; | &nbsp;
    🖥️ <a href="https://github.com/Zhengsh123/V-Bridge" target="_blank">Code</a> &nbsp; &nbsp;
    🌐 <a href="https://zhengsh123.github.io/V-Bridge/" target="_blank">Website</a> &nbsp; &nbsp;
 </p>
-This repo contains the model for the paper V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration.
 # Overview
-Large-scale video generative models are trained on vast and diverse visual data, enabling them to internalize rich structural, semantic, and dynamic priors of the visual world. While these models have demonstrated impressive generative capability, their potential as general-purpose visual learners remains largely untapped. In this work, we introduce V-Bridge, a framework that bridges this latent capacity to versatile few-shot image restoration tasks. We reinterpret image restoration not as a static regression problem, but as a progressive generative process, and leverage video models to simulate the gradual refinement from degraded inputs to high-fidelity outputs. Surprisingly, with only 1,000 multi-task training samples (less than 2% of existing restoration methods), pretrained video models can be induced to perform competitive image restoration, achieving multiple tasks with a single model, rivaling specialized architectures designed explicitly for this purpose. Our findings reveal that video generative models implicitly learn powerful and transferable restoration priors that can be activated with only extremely limited data, challenging the traditional boundary between generative modeling and low-level vision, and opening a new design paradigm for foundation models in visual tasks.
 # Details
 Our model uses a full fine-tuning approach, with the base model being [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B).
-The following are some of the detailed parameters for inference.
-```
 cfg_skip_ratio = 0.15
 sampler_name = "Flow_Unipc"
@@ -58,4 +63,14 @@ num_inference_steps = 50
 More details and usage instructions can be found on [GitHub](https://github.com/Zhengsh123/V-Bridge).
 # Acknowledgements
-We would like to thank the contributors to [Wan-AI](https://huggingface.co/Wan-AI), [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun) and HuggingFace repositories, for their open research.

 ---
 license: apache-2.0
+library_name: diffusers
+pipeline_tag: image-to-image
 ---
 <p align="center">
+   📄 <a href="https://huggingface.co/papers/2603.13089" target="_blank">Paper</a> &nbsp; | &nbsp;
    🖥️ <a href="https://github.com/Zhengsh123/V-Bridge" target="_blank">Code</a> &nbsp; &nbsp;
    🌐 <a href="https://zhengsh123.github.io/V-Bridge/" target="_blank">Website</a> &nbsp; &nbsp;
 </p>
+This repository contains the model for the paper [V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration](https://huggingface.co/papers/2603.13089).
 # Overview
+Large-scale video generative models are trained on vast and diverse visual data, enabling them to internalize rich structural, semantic, and dynamic priors of the visual world. V-Bridge is a framework that bridges this latent capacity to versatile few-shot image restoration tasks. By reinterpreting image restoration as a progressive generative process, V-Bridge leverages video models to simulate the gradual refinement from degraded inputs to high-fidelity outputs.
+Surprisingly, with only 1,000 multi-task training samples (less than 2% of existing restoration methods), pretrained video models can be induced to perform competitive image restoration, achieving multiple tasks with a single model and rivaling specialized architectures designed explicitly for this purpose.
 # Details
 Our model uses a full fine-tuning approach, with the base model being [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B).
+The following are some of the detailed parameters for inference:
+```python
 cfg_skip_ratio = 0.15
 sampler_name = "Flow_Unipc"
 More details and usage instructions can be found on [GitHub](https://github.com/Zhengsh123/V-Bridge).
 # Acknowledgements
+We would like to thank the contributors to [Wan-AI](https://huggingface.co/Wan-AI), [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun) and HuggingFace repositories, for their open research.
+# Citation
+```bibtex
+@article{zheng2026V-Bridge,
+  title={V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration},
+  author={Zheng, Shenghe and Jiang, Junpeng and Li, Wenbo},
+  journal={arXiv preprint arXiv:2603.13089},
+  year={2026}
+}
+```