Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Paper ⢠2510.04618 ⢠Published Oct 6 ⢠120
Artificial Hippocampus Networks for Efficient Long-Context Modeling Paper ⢠2510.07318 ⢠Published Oct 8 ⢠30
view changelog Changelog Introducing HF Jobs: Run scalable compute jobs on Hugging Face Jul 30 ⢠200
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14, 2024 ⢠70
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper ⢠2502.11089 ⢠Published Feb 16 ⢠165
view article Article OpenReasoning-Nemotron: A Family of State-of-the-Art Distilled Reasoning Models Jul 18 ⢠50
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation Paper ⢠2506.19852 ⢠Published Jun 24 ⢠41
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper ⢠2502.07316 ⢠Published Feb 11 ⢠50
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper ⢠2502.05171 ⢠Published Feb 7 ⢠151
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper ⢠2501.18512 ⢠Published Jan 30 ⢠30
Structured 3D Latents for Scalable and Versatile 3D Generation Paper ⢠2412.01506 ⢠Published Dec 2, 2024 ⢠83
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper ⢠2306.13649 ⢠Published Jun 23, 2023 ⢠26
Cautious Optimizers: Improving Training with One Line of Code Paper ⢠2411.16085 ⢠Published Nov 25, 2024 ⢠20