-
Let LLMs Break Free from Overthinking via Self-Braking Tuning
Paper • 2505.14604 • Published • 23 -
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
Paper • 2505.16944 • Published • 8 -
Training Step-Level Reasoning Verifiers with Formal Verification Tools
Paper • 2505.15960 • Published • 7 -
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Paper • 2505.15134 • Published • 6
Collections
Discover the best community collections!
Collections including paper arxiv:2510.03279
-
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Paper • 2412.11605 • Published • 18 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 41 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10
-
Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models
Paper • 2503.11224 • Published • 28 -
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Paper • 2503.11579 • Published • 22 -
Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering
Paper • 2508.08974 • Published -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 37
-
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 32 -
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper • 2401.17574 • Published • 17 -
Scalable Autoregressive Image Generation with Mamba
Paper • 2408.12245 • Published • 26 -
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Paper • 2408.12570 • Published • 33
-
Let LLMs Break Free from Overthinking via Self-Braking Tuning
Paper • 2505.14604 • Published • 23 -
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
Paper • 2505.16944 • Published • 8 -
Training Step-Level Reasoning Verifiers with Formal Verification Tools
Paper • 2505.15960 • Published • 7 -
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Paper • 2505.15134 • Published • 6
-
Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models
Paper • 2503.11224 • Published • 28 -
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Paper • 2503.11579 • Published • 22 -
Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering
Paper • 2508.08974 • Published -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 37
-
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Paper • 2412.11605 • Published • 18 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 41 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10
-
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 32 -
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper • 2401.17574 • Published • 17 -
Scalable Autoregressive Image Generation with Mamba
Paper • 2408.12245 • Published • 26 -
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Paper • 2408.12570 • Published • 33