Introduction
This is the pretrained model for paper "Monet: Reasoning in Latent Visual Space Beyond Images and Language"
Paper: http://arxiv.org/abs/2511.21395
Code: https://github.com/NOVAglow646/Monet
How to use this model: we provide an inference example in our GitHub repo.
Citation
If you find this work useful, please use the following BibTeX. Thank you for your support!
@misc{wang2025monetreasoninglatentvisual,
title={Monet: Reasoning in Latent Visual Space Beyond Images and Language},
author={Qixun Wang and Yang Shi and Yifei Wang and Yuanxing Zhang and Pengfei Wan and Kun Gai and Xianghua Ying and Yisen Wang},
year={2025},
eprint={2511.21395},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.21395},
}
- Downloads last month
- 206