Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.08313

How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models

Paper • 2509.19371 • Published Sep 19
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Paper • 2505.06708 • Published May 10 • 4
Selective Attention: Enhancing Transformer through Principled Context Control

Paper • 2411.12892 • Published Nov 19, 2024
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10 • 188

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 299
Agent-Ark/Toucan-1.5M

Viewer • Updated Oct 4 • 1.65M • 12k • 176
facebook/natural_reasoning

Viewer • Updated Feb 21 • 1.15M • 2.38k • 543
Salesforce/Webscale-RL

Viewer • Updated Oct 14 • 1.11M • 3.09k • 79

Rewnozom/agent-zero-v1-a-01

Text Generation • 4B • Updated Jan 18 • 1
TheBloke/MythoMax-L2-13B-GGUF

13B • Updated Sep 27, 2023 • 57.8k • 201
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF

Text Generation • 18B • Updated Jul 28 • 49k • 403
QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF

Text Generation • 8B • Updated Jul 29, 2024 • 11.1k • 120

To Read collection

interesting papers to read

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published Mar 31 • 62
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published Mar 24 • 119
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 141

wisdom of the ancient

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 299
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 425
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 259

This collection is a list of papers I find to be very interesting.

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 625
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 299
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 308
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Paper • 2509.03867 • Published Sep 4 • 209

MiniMax (Large Language Model) - Original and Transformers Compatible Weights

MiniMaxAI/MiniMax-Text-01-hf

Text Generation • 456B • Updated Jul 9 • 9.67k • 8
MiniMaxAI/MiniMax-M1-80k-hf

Text Generation • 456B • Updated Jul 9 • 79 • 6
MiniMaxAI/MiniMax-M1-40k-hf

Text Generation • Updated Jul 11 • 90 • 10
MiniMaxAI/MiniMax-Text-01

Text Generation • 456B • Updated Jul 3 • 2.87k • 650

MiniMaxAI/MiniMax-Text-01

Text Generation • 456B • Updated Jul 3 • 2.87k • 650
MiniMaxAI/MiniMax-VL-01

Image-Text-to-Text • 456B • Updated Jul 3 • 81k • 280
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 299
Running

117

117

MiniMaxText01

💬

Generate responses to text and images in a chat interface

Running

11

11

Inpaint mask maker

👺

Swap faces in images with adjustments
deepseek-ai/DeepSeek-V3-0324

Text Generation • 685B • Updated Mar 27 • 206k • • 3.08k
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 299

deepseek-ai/DeepSeek-R1

Text Generation • 685B • Updated Mar 27 • 660k • • 12.9k
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 625
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 299
open-r1/OpenR1-Math-220k

Viewer • Updated Feb 18 • 450k • 11.3k • 667

How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models

Paper • 2509.19371 • Published Sep 19
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Paper • 2505.06708 • Published May 10 • 4
Selective Attention: Enhancing Transformer through Principled Context Control

Paper • 2411.12892 • Published Nov 19, 2024
A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10 • 188

This collection is a list of papers I find to be very interesting.

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 625
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 299
Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 308
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Paper • 2509.03867 • Published Sep 4 • 209

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 299
Agent-Ark/Toucan-1.5M

Viewer • Updated Oct 4 • 1.65M • 12k • 176
facebook/natural_reasoning

Viewer • Updated Feb 21 • 1.15M • 2.38k • 543
Salesforce/Webscale-RL

Viewer • Updated Oct 14 • 1.11M • 3.09k • 79

MiniMax (Large Language Model) - Original and Transformers Compatible Weights

MiniMaxAI/MiniMax-Text-01-hf

Text Generation • 456B • Updated Jul 9 • 9.67k • 8
MiniMaxAI/MiniMax-M1-80k-hf

Text Generation • 456B • Updated Jul 9 • 79 • 6
MiniMaxAI/MiniMax-M1-40k-hf

Text Generation • Updated Jul 11 • 90 • 10
MiniMaxAI/MiniMax-Text-01

Text Generation • 456B • Updated Jul 3 • 2.87k • 650

Rewnozom/agent-zero-v1-a-01

Text Generation • 4B • Updated Jan 18 • 1
TheBloke/MythoMax-L2-13B-GGUF

13B • Updated Sep 27, 2023 • 57.8k • 201
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF

Text Generation • 18B • Updated Jul 28 • 49k • 403
QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF

Text Generation • 8B • Updated Jul 29, 2024 • 11.1k • 120

MiniMaxAI/MiniMax-Text-01

Text Generation • 456B • Updated Jul 3 • 2.87k • 650
MiniMaxAI/MiniMax-VL-01

Image-Text-to-Text • 456B • Updated Jul 3 • 81k • 280
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 299
Running

117

117

MiniMaxText01

💬

Generate responses to text and images in a chat interface

To Read collection

interesting papers to read

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

Paper • 2503.24290 • Published Mar 31 • 62
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published Mar 24 • 119
START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 141

Running

11

11

Inpaint mask maker

👺

Swap faces in images with adjustments
deepseek-ai/DeepSeek-V3-0324

Text Generation • 685B • Updated Mar 27 • 206k • • 3.08k
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 299

wisdom of the ancient

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 299
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 425
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 259

deepseek-ai/DeepSeek-R1

Text Generation • 685B • Updated Mar 27 • 660k • • 12.9k
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 625
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 299
open-r1/OpenR1-Math-220k

Viewer • Updated Feb 18 • 450k • 11.3k • 667

Previous
1
2
3
...
6
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs