Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning Paper • 2509.24372 • Published Sep 29 • 9
Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning Paper • 2508.09726 • Published Aug 13 • 14
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published Oct 7 • 101
view article Article Training and Finetuning Reranker Models with Sentence Transformers v4 Mar 26 • 174
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Paper • 2506.18841 • Published Jun 23 • 56
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models Paper • 2506.15681 • Published Jun 18 • 39
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper • 2504.21233 • Published Apr 30 • 49
Distilling LLM Agent into Small Models with Retrieval and Code Tools Paper • 2505.17612 • Published May 23 • 81
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 66
Synthetic (text) Dataset Generation Collection Papers about synthetic dataset generation • 9 items • Updated Jun 21, 2024 • 9
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning Paper • 2410.02089 • Published Oct 2, 2024 • 13
view article Article Rank-Stabilized LoRA: Unlocking the Potential of LoRA Fine-Tuning Feb 20, 2024 • 29
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper • 2410.06885 • Published Oct 9, 2024 • 46