Collections
Discover the best community collections!
Collections including paper arXiv:2503.11576
-
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights
Paper • 2410.09008 • Published • 17 -
answerdotai/ModernBERT-base
Fill-Mask • 0.1B • Updated • 770k • 951 -
answerdotai/ModernBERT-large
Fill-Mask • 0.4B • Updated • 81.9k • 428 -
microsoft/phi-4
Text Generation • 15B • Updated • 509k • 2.19k
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 9 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 27 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 35 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30
-
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
Paper • 2410.21169 • Published • 30 -
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Paper • 2409.02889 • Published • 54 -
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 30 -
Contextual Document Embeddings
Paper • 2410.02525 • Published • 24
-
Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts
Paper • 2409.13449 • Published • 12 -
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
Paper • 2503.21460 • Published • 83 -
Unified Multimodal Discrete Diffusion
Paper • 2503.20853 • Published • 9 -
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
Paper • 2503.11576 • Published • 117
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 23 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts
Paper • 2409.13449 • Published • 12 -
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
Paper • 2503.21460 • Published • 83 -
Unified Multimodal Discrete Diffusion
Paper • 2503.20853 • Published • 9 -
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
Paper • 2503.11576 • Published • 117
-
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights
Paper • 2410.09008 • Published • 17 -
answerdotai/ModernBERT-base
Fill-Mask • 0.1B • Updated • 770k • 951 -
answerdotai/ModernBERT-large
Fill-Mask • 0.4B • Updated • 81.9k • 428 -
microsoft/phi-4
Text Generation • 15B • Updated • 509k • 2.19k
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 9 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 27 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 35 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 23 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
Paper • 2410.21169 • Published • 30 -
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
Paper • 2409.02889 • Published • 54 -
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 30 -
Contextual Document Embeddings
Paper • 2410.02525 • Published • 24