Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation Paper • 2510.22115 • Published 19 days ago • 81
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats Paper • 2510.25602 • Published 15 days ago • 68
Parallel Loop Transformer for Efficient Test-Time Computation Scaling Paper • 2510.24824 • Published 16 days ago • 14
Uniform Discrete Diffusion with Metric Path for Video Generation Paper • 2510.24717 • Published 16 days ago • 39
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published 28 days ago • 37
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding Paper • 2510.06308 • Published Oct 7 • 52
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26 • 67
Inpainting-Guided Policy Optimization for Diffusion Large Language Models Paper • 2509.10396 • Published Sep 12 • 15
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models Paper • 2509.06949 • Published Sep 8 • 56
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning Paper • 2508.18756 • Published Aug 26 • 36
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs Paper • 2508.14896 • Published Aug 20 • 22
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing Paper • 2508.09192 • Published Aug 8 • 30
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models Paper • 2508.09138 • Published Aug 12 • 36