-
How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models
Paper • 2509.19371 • Published -
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Paper • 2505.06708 • Published • 4 -
Selective Attention: Enhancing Transformer through Principled Context Control
Paper • 2411.12892 • Published -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 188
Collections
Discover the best community collections!
Collections including paper arxiv:2501.08313
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 299 -
Agent-Ark/Toucan-1.5M
Viewer • Updated • 1.65M • 12k • 176 -
facebook/natural_reasoning
Viewer • Updated • 1.15M • 2.38k • 543 -
Salesforce/Webscale-RL
Viewer • Updated • 1.11M • 3.09k • 79
-
Rewnozom/agent-zero-v1-a-01
Text Generation • 4B • Updated • 1 -
TheBloke/MythoMax-L2-13B-GGUF
13B • Updated • 57.8k • 201 -
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF
Text Generation • 18B • Updated • 49k • 403 -
QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF
Text Generation • 8B • Updated • 11.1k • 120
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 141
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 299 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 425 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 376 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 299 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 308 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 209
-
MiniMaxAI/MiniMax-Text-01-hf
Text Generation • 456B • Updated • 9.67k • 8 -
MiniMaxAI/MiniMax-M1-80k-hf
Text Generation • 456B • Updated • 79 • 6 -
MiniMaxAI/MiniMax-M1-40k-hf
Text Generation • Updated • 90 • 10 -
MiniMaxAI/MiniMax-Text-01
Text Generation • 456B • Updated • 2.87k • 650
-
MiniMaxAI/MiniMax-Text-01
Text Generation • 456B • Updated • 2.87k • 650 -
MiniMaxAI/MiniMax-VL-01
Image-Text-to-Text • 456B • Updated • 81k • 280 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 299 -
117
MiniMaxText01
💬Generate responses to text and images in a chat interface
-
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 660k • • 12.9k -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 299 -
open-r1/OpenR1-Math-220k
Viewer • Updated • 450k • 11.3k • 667
-
How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models
Paper • 2509.19371 • Published -
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Paper • 2505.06708 • Published • 4 -
Selective Attention: Enhancing Transformer through Principled Context Control
Paper • 2411.12892 • Published -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 188
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 299 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 308 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 209
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 299 -
Agent-Ark/Toucan-1.5M
Viewer • Updated • 1.65M • 12k • 176 -
facebook/natural_reasoning
Viewer • Updated • 1.15M • 2.38k • 543 -
Salesforce/Webscale-RL
Viewer • Updated • 1.11M • 3.09k • 79
-
MiniMaxAI/MiniMax-Text-01-hf
Text Generation • 456B • Updated • 9.67k • 8 -
MiniMaxAI/MiniMax-M1-80k-hf
Text Generation • 456B • Updated • 79 • 6 -
MiniMaxAI/MiniMax-M1-40k-hf
Text Generation • Updated • 90 • 10 -
MiniMaxAI/MiniMax-Text-01
Text Generation • 456B • Updated • 2.87k • 650
-
Rewnozom/agent-zero-v1-a-01
Text Generation • 4B • Updated • 1 -
TheBloke/MythoMax-L2-13B-GGUF
13B • Updated • 57.8k • 201 -
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF
Text Generation • 18B • Updated • 49k • 403 -
QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF
Text Generation • 8B • Updated • 11.1k • 120
-
MiniMaxAI/MiniMax-Text-01
Text Generation • 456B • Updated • 2.87k • 650 -
MiniMaxAI/MiniMax-VL-01
Image-Text-to-Text • 456B • Updated • 81k • 280 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 299 -
117
MiniMaxText01
💬Generate responses to text and images in a chat interface
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 141
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 299 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 425 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 376 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259
-
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 660k • • 12.9k -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 299 -
open-r1/OpenR1-Math-220k
Viewer • Updated • 450k • 11.3k • 667