| | --- |
| | base_model: |
| | - Qwen/Qwen2.5-VL-7B-Instruct |
| | pipeline_tag: image-text-to-text |
| | library_name: adapter-transformers |
| | --- |
| | |
| | This is the **vSearcher** model introduced in paper ["InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search"](https://arxiv.org/abs/2512.18745). |
| | The model is finetuned from `Qwen2.5-VL-7B-Instruct` via RL as a subagent under **vReasoner** `GPT-5-mini`. |
| | For more information on how to use this model, see our [GitHub page](https://github.com/m-Just/InSight-o3). |
| |
|
| | ``` |
| | @inproceedings{li2026insight_o3, |
| | title={InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search}, |
| | author={Kaican Li and Lewei Yao and Jiannan Wu and Tiezheng Yu and Jierun Chen and Haoli Bai and Lu Hou and Lanqing Hong and Wei Zhang and Nevin L. Zhang}, |
| | booktitle={The Fourteenth International Conference on Learning Representations}, |
| | year={2026} |
| | } |
| | ``` |