Qwen/Qwen2.5-Omni-7B
Any-to-Any • Updated
• 361k • 1.87k
This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update
Chat with AI using text, audio, images, and video
A unified multimodal understanding and generation model.
Chat with Kimi-VL: respond to text, images, video, PDFs
Generate text from images and queries
Generate answers by combining text and images
Chat with images and text using AI assistant
Annotate and describe images with text prompts
Generate text and segment images using PaliGemma 2
Demo for ShieldGemma 2, multimodal safety model
Check if text and images are safe
Chat with a multimodal AI using text, images, or video
Generate responses to video or image inputs