m-Just
/

InSight-o3-vS

Image-Text-to-Text

Model card Files Files and versions

InSight-o3-vS / README.md

m-Just's picture

Update README.md

661fff0 verified 27 days ago

|

history blame contribute delete

901 Bytes

	---
	base_model:
	- Qwen/Qwen2.5-VL-7B-Instruct
	pipeline_tag: image-text-to-text
	library_name: adapter-transformers
	---

	This is the vSearcher model introduced in paper ["InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search"](https://arxiv.org/abs/2512.18745).
	The model is finetuned from `Qwen2.5-VL-7B-Instruct` via RL as a subagent under vReasoner `GPT-5-mini`.
	For more information on how to use this model, see our [GitHub page](https://github.com/m-Just/InSight-o3).

	```
	@inproceedings{li2026insight_o3,
	title={InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search},
	author={Kaican Li and Lewei Yao and Jiannan Wu and Tiezheng Yu and Jierun Chen and Haoli Bai and Lu Hou and Lanqing Hong and Wei Zhang and Nevin L. Zhang},
	booktitle={The Fourteenth International Conference on Learning Representations},
	year={2026}
	}
	```