view article Article The Heterogeneous Feature of RoPE-based Attention in Long-Context LLMs 3 days ago • 10
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls Paper • 2511.09148 • Published 6 days ago • 15
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs Paper • 2511.07419 • Published 8 days ago • 24
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper • 2511.04962 • Published 11 days ago • 50
SYNTH Collection Fully generalist synthetic dataset and SOTA small reasoners • 3 items • Updated 8 days ago • 9
Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated Jun 6 • 36
autoweeb/Qwen-Image-Edit-2509-Photo-to-Anime Image-to-Image • Updated 7 days ago • 2.45k • • 83
Running 3.49k 3.49k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters