LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls Paper • 2511.09148 • Published 3 days ago • 15
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs Paper • 2511.07419 • Published 4 days ago • 23
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper • 2511.04962 • Published 8 days ago • 50
SYNTH Collection Fully generalist synthetic dataset and SOTA small reasoners • 3 items • Updated 5 days ago • 9
Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated Jun 6 • 36
Reasoning with Sampling: Your Base Model is Smarter Than You Think Paper • 2510.14901 • Published 29 days ago • 47
Document Understanding, Measurement, and Manipulation Using Category Theory Paper • 2510.21553 • Published 22 days ago • 4
QueST: Incentivizing LLMs to Generate Difficult Problems Paper • 2510.17715 • Published 25 days ago • 33
FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs Paper • 2510.08886 • Published Oct 10 • 19
Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context Paper • 2510.06182 • Published Oct 7 • 8
Fine-Tuning on Noisy Instructions: Effects on Generalization and Performance Paper • 2510.03528 • Published Oct 3 • 16