-
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 90 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 115 -
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 125
Collections
Discover the best community collections!
Collections including paper arxiv:2412.06769
-
Large Language Models Can Self-Improve in Long-context Reasoning
Paper • 2411.08147 • Published • 66 -
Reverse Thinking Makes LLMs Stronger Reasoners
Paper • 2411.19865 • Published • 23 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 90 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104
-
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines
Paper • 2408.01050 • Published • 9 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Paper • 2409.02795 • Published • 72 -
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Paper • 2409.04593 • Published • 26
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 39 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 53 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper • 2402.12875 • Published • 13
-
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 23 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 23 -
Adaptive Decoding via Latent Preference Optimization
Paper • 2411.09661 • Published • 10 -
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Paper • 2411.13476 • Published • 16
-
On Memorization of Large Language Models in Logical Reasoning
Paper • 2410.23123 • Published • 18 -
LLMs Do Not Think Step-by-step In Implicit Reasoning
Paper • 2411.15862 • Published • 11 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 90 -
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper • 2412.17747 • Published • 32
-
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Paper • 2408.07199 • Published • 22 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 78 -
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Paper • 2406.12050 • Published • 19 -
Let's Verify Step by Step
Paper • 2305.20050 • Published • 11
-
VILA^2: VILA Augmented VILA
Paper • 2407.17453 • Published • 41 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118 -
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 48 -
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Paper • 2408.15518 • Published • 42
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 190 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 72 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 173 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 34
-
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 90 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 115 -
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 125
-
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 23 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 23 -
Adaptive Decoding via Latent Preference Optimization
Paper • 2411.09661 • Published • 10 -
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Paper • 2411.13476 • Published • 16
-
Large Language Models Can Self-Improve in Long-context Reasoning
Paper • 2411.08147 • Published • 66 -
Reverse Thinking Makes LLMs Stronger Reasoners
Paper • 2411.19865 • Published • 23 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 90 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104
-
On Memorization of Large Language Models in Logical Reasoning
Paper • 2410.23123 • Published • 18 -
LLMs Do Not Think Step-by-step In Implicit Reasoning
Paper • 2411.15862 • Published • 11 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 90 -
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper • 2412.17747 • Published • 32
-
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Paper • 2408.07199 • Published • 22 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 78 -
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Paper • 2406.12050 • Published • 19 -
Let's Verify Step by Step
Paper • 2305.20050 • Published • 11
-
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines
Paper • 2408.01050 • Published • 9 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Paper • 2409.02795 • Published • 72 -
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Paper • 2409.04593 • Published • 26
-
VILA^2: VILA Augmented VILA
Paper • 2407.17453 • Published • 41 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118 -
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 48 -
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Paper • 2408.15518 • Published • 42
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 39 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 53 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper • 2402.12875 • Published • 13
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 190 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 72 -
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper • 2403.13372 • Published • 173 -
InternLM2 Technical Report
Paper • 2403.17297 • Published • 34