Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.08910

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 5.71k • 1.22k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17 • 111 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 63

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

Paper • 2502.13063 • Published Feb 18 • 72
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165
LLM Pretraining with Continuous Concepts

Paper • 2502.08524 • Published Feb 12 • 29

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens

Paper • 2502.18890 • Published Feb 26 • 30
MPO: Boosting LLM Agents with Meta Plan Optimization

Paper • 2503.02682 • Published Mar 4 • 28
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Paper • 2505.20411 • Published May 26 • 89

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 151
S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published Feb 20 • 63

LLM structure optimization

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10 • 153
The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published Feb 9 • 40

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 5.71k • 1.22k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17 • 111 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 63

LLM structure optimization

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10 • 153
The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published Feb 9 • 40

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

Paper • 2502.13063 • Published Feb 18 • 72
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 165
LLM Pretraining with Continuous Concepts

Paper • 2502.08524 • Published Feb 12 • 29

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens

Paper • 2502.18890 • Published Feb 26 • 30
MPO: Boosting LLM Agents with Meta Plan Optimization

Paper • 2503.02682 • Published Mar 4 • 28
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents

Paper • 2505.20411 • Published May 26 • 89

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 151
S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published Feb 20 • 63

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 148

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs