--- license: apache-2.0 base_model: - Qwen/Qwen3-VL-2B-Instruct language: - en tags: - text - image - video - multimodal-embedding - vidore - colpali - colqwen3 - multilingual-embedding new_version: goodman2001/colqwen3-vlembed-base pipeline_tag: visual-document-retrieval --- # ColQwen3-vlembed-base: Visual Retriever built by merging Qwen3-VL-2B-Instruct with Qwen3-VL-Embedding-2B through ColBERT strategy ## 🙏🙏🙏 Why always 2B ?? Due to my limited computing resources, I can currently only conduct some interesting experiments with 2B/4B models.😝😝😝

## Usage > [!WARNING] > This version should not be used: it is solely the base version useful for deterministic LoRA initialization. This model is built by merging **[Qwen/Qwen3-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct)** with **[Qwen3-VL-Embedding-2B](https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B)**

## Contact - Mungeryang: mungerygm@gmail.com/yangguimiao@iie.ac.cn ## Acknowledgments ❤️❤️❤️ > [!WARNING] > Thanks to the **Colpali team** and **Qwen team** for their excellent open-source works! > I accomplished this work by **standing on the shoulders of giants~**

## Citation If you use any datasets or models from this organization in your research, please cite the original dataset as follows: ```bibtex @misc{faysse2024colpaliefficientdocumentretrieval, title={ColPali: Efficient Document Retrieval with Vision Language Models}, author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo}, year={2024}, eprint={2407.01449}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2407.01449}, }