Introduction

This is the pretrained model for paper "Monet: Reasoning in Latent Visual Space Beyond Images and Language"

Paper: http://arxiv.org/abs/2511.21395

Code: https://github.com/NOVAglow646/Monet

How to use this model: we provide an inference example in our GitHub repo.

Citation

If you find this work useful, please use the following BibTeX. Thank you for your support!

@misc{wang2025monetreasoninglatentvisual,
      title={Monet: Reasoning in Latent Visual Space Beyond Images and Language}, 
      author={Qixun Wang and Yang Shi and Yifei Wang and Yuanxing Zhang and Pengfei Wan and Kun Gai and Xianghua Ying and Yisen Wang},
      year={2025},
      eprint={2511.21395},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.21395}, 
}
Downloads last month
206
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for NOVAglow646/Monet-7B

Finetuned
(900)
this model
Quantizations
1 model