Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.08313

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 103
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published Jan 14 • 63
Qwen2.5-1M Technical Report

Paper • 2501.15383 • Published Jan 26 • 72

jeffmeloy/Qwen2.5-7B-nerd-uncensored-v1.0

Text Generation • 8B • Updated Nov 3, 2024 • 15 • • 5
Running

215

Janus Pro WebGPU

🏛

215

In-browser unified multimodal understanding and generation.
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF

Text Generation • 18B • Updated Jul 28 • 49k • 404

LLM Architecture

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31 • 22
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation

Paper • 2502.01068 • Published Feb 3 • 18
Scaling Embedding Layers in Language Models

Paper • 2502.01637 • Published Feb 3 • 24

Attention improvement metrics

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300

LLM Pretraining

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15 • 83
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 314

interesting papers

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 249
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 258
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 425
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300

SciML/PhysicsML papers

PINNACLE: PINN Adaptive ColLocation and Experimental points selection

Paper • 2404.07662 • Published Apr 11, 2024
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 115
ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 84
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Paper • 2412.15084 • Published Dec 19, 2024 • 13
The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 99

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 285
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 157
Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 103
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published Jan 14 • 63
Qwen2.5-1M Technical Report

Paper • 2501.15383 • Published Jan 26 • 72

interesting papers

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 249
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 258
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 425
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300

jeffmeloy/Qwen2.5-7B-nerd-uncensored-v1.0

Text Generation • 8B • Updated Nov 3, 2024 • 15 • • 5
Running

215

Janus Pro WebGPU

🏛

215

In-browser unified multimodal understanding and generation.
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
DavidAU/Llama-3.2-8X3B-MOE-Dark-Champion-Instruct-uncensored-abliterated-18.4B-GGUF

Text Generation • 18B • Updated Jul 28 • 49k • 404

SciML/PhysicsML papers

PINNACLE: PINN Adaptive ColLocation and Experimental points selection

Paper • 2404.07662 • Published Apr 11, 2024
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300

LLM Architecture

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31 • 22
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation

Paper • 2502.01068 • Published Feb 3 • 18
Scaling Embedding Layers in Language Models

Paper • 2502.01637 • Published Feb 3 • 24

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published Jan 17 • 115
ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 84
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Paper • 2412.15084 • Published Dec 19, 2024 • 13
The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 99

Attention improvement metrics

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 285
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 157
Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147

LLM Pretraining

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300
Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15 • 83
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 314

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 300

Previous
1
2
3
4
...
6
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs