Spatial reasoning with vision-language models
Generate captions for images with CapRL
A unified framework for reasoning and reward modeling