On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Carnegie Mellon University, Language Technologies Institute

Does Reinforcement Learning Truly Extend Reasoning?

This work explores the discrepancy in views on RL's effectiveness in extending language models' reasoning abilities. Some characterize RL as a capability refiner, while others see it as inducing new compositional skills. This challenge stems from a lack of control in modern training pipelines. Our work aims to resolve this conflict through controlled analysis, going beyond the initial description that this repository contains mid-training related checkpoints in the extrapolation tasks.

🔍 Overview

Our paper builds a fully controlled experimental framework to analyze how pre-training, mid-training, and RL-based post-training jointly shape the reasoning abilities of language models. Using synthetic math-style reasoning tasks with explicit atomic operations and process-verifiable reasoning traces, we study:

Extrapolative generalization to more complex compositions (deeper dependency graphs).
Contextual generalization across diverse surface forms and linguistic contexts.
How RL interacts with prior knowledge, and when it yields genuine capability gains beyond pre-training.

🧠 Key findings

You may also find the comic generated by Notebook LLM [here](assets/Interplay-LM-Reasoning.pdf).

Code

The code and data for this work will be released soon at the following GitHub repository: https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning

📚 Citation

If you find this work or code useful, please consider citing:

@misc{zhang2025interplaypretrainingmidtrainingrl,
      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models}, 
      author={Charlie Zhang and Graham Neubig and Xiang Yue},
      year={2025},
      eprint={2512.07783},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.07783}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track