-
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 376 -
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 150 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 4 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 208
Collections
Discover the best community collections!
Collections including paper arXiv:2401.02954
-
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 65 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 50
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 47
-
Attention Is All You Need
Paper • 1706.03762 • Published • 96 -
LoRA Learns Less and Forgets Less
Paper • 2405.09673 • Published • 89 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 72
-
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 50 -
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Paper • 2403.08763 • Published • 51 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48
-
Rethinking Optimization and Architecture for Tiny Language Models
Paper • 2402.02791 • Published • 13 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 47 -
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper • 2401.17574 • Published • 17 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
Perspectives on the State and Future of Deep Learning - 2023
Paper • 2312.09323 • Published • 8 -
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Paper • 2405.15071 • Published • 41 -
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning
Paper • 2407.10718 • Published • 19
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
Learning Vision from Models Rivals Learning Vision from Data
Paper • 2312.17742 • Published • 16 -
PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation
Paper • 2312.17276 • Published • 16 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 16
-
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 376 -
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 150 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 4 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 208
-
Attention Is All You Need
Paper • 1706.03762 • Published • 96 -
LoRA Learns Less and Forgets Less
Paper • 2405.09673 • Published • 89 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 72
-
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 50 -
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Paper • 2403.08763 • Published • 51 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48
-
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 65 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
Gemma: Open Models Based on Gemini Research and Technology
Paper • 2403.08295 • Published • 50
-
Rethinking Optimization and Architecture for Tiny Language Models
Paper • 2402.02791 • Published • 13 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 47 -
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper • 2401.17574 • Published • 17 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
Perspectives on the State and Future of Deep Learning - 2023
Paper • 2312.09323 • Published • 8 -
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Paper • 2405.15071 • Published • 41 -
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning
Paper • 2407.10718 • Published • 19
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 47
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
Learning Vision from Models Rivals Learning Vision from Data
Paper • 2312.17742 • Published • 16 -
PanGu-π: Enhancing Language Model Architectures via Nonlinearity Compensation
Paper • 2312.17276 • Published • 16 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 16