Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2511.16719

about 1 hour ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

about 3 hours ago

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 26 days ago • 114
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

Paper • 2511.09554 • Published Nov 12 • 6

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 26 days ago • 114

MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models

Paper • 2511.18373 • Published 23 days ago • 5
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Paper • 2511.13288 • Published 29 days ago • 17
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Paper • 2511.19418 • Published 22 days ago • 27
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 26 days ago • 114

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

Paper • 2406.04151 • Published Jun 6, 2024 • 24
DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Paper • 2510.16872 • Published Oct 19 • 106
Scaling Generalist Data-Analytic Agents

Paper • 2509.25084 • Published Sep 29 • 18
Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16 • 117

Project Ideas 2026

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 26 days ago • 114
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published 7 days ago • 121
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Paper • 2512.04677 • Published 12 days ago • 166
LongCat-Image Technical Report

Paper • 2512.07584 • Published 8 days ago • 17

Papers I want to read 🗞️

Machine Learning Operations (MLOps): Overview, Definition, and Architecture

Paper • 2205.02302 • Published May 4, 2022 • 1
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 26 days ago • 114

MedSAM3: Delving into Segment Anything with Medical Concepts

Paper • 2511.19046 • Published 22 days ago • 49
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 26 days ago • 114

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published 27 days ago • 222
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

Paper • 2511.15065 • Published 27 days ago • 74
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 26 days ago • 114
Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published 20 days ago • 33

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9 • 109
Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13 • 165
Spotlight on Token Perception for Multimodal Reinforcement Learning

Paper • 2510.09285 • Published Oct 10 • 36
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Paper • 2510.17354 • Published Oct 20 • 33

about 1 hour ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Project Ideas 2026

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 26 days ago • 114
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published 7 days ago • 121
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Paper • 2512.04677 • Published 12 days ago • 166
LongCat-Image Technical Report

Paper • 2512.07584 • Published 8 days ago • 17

about 3 hours ago

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 26 days ago • 114
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

Paper • 2511.09554 • Published Nov 12 • 6

Papers I want to read 🗞️

Machine Learning Operations (MLOps): Overview, Definition, and Architecture

Paper • 2205.02302 • Published May 4, 2022 • 1
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 26 days ago • 114

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 26 days ago • 114

MedSAM3: Delving into Segment Anything with Medical Concepts

Paper • 2511.19046 • Published 22 days ago • 49
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 26 days ago • 114

MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models

Paper • 2511.18373 • Published 23 days ago • 5
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Paper • 2511.13288 • Published 29 days ago • 17
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Paper • 2511.19418 • Published 22 days ago • 27
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 26 days ago • 114

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published 27 days ago • 222
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

Paper • 2511.15065 • Published 27 days ago • 74
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 26 days ago • 114
Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published 20 days ago • 33

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

Paper • 2406.04151 • Published Jun 6, 2024 • 24
DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Paper • 2510.16872 • Published Oct 19 • 106
Scaling Generalist Data-Analytic Agents

Paper • 2509.25084 • Published Sep 29 • 18
Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16 • 117

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9 • 109
Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13 • 165
Spotlight on Token Perception for Multimodal Reinforcement Learning

Paper • 2510.09285 • Published Oct 10 • 36
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Paper • 2510.17354 • Published Oct 20 • 33

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs