On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
Charlie Zhang, Graham Neubig, Xiang Yue
Carnegie Mellon University, Language Technologies Institute
Does Reinforcement Learning Truly Extend Reasoning?
This work explores the discrepancy in views on RL's effectiveness in extending language models' reasoning abilities. Some characterize RL as a capability refiner, while others see it as inducing new compositional skills. This challenge stems from a lack of control in modern training pipelines. Our work aims to resolve this conflict through controlled analysis, going beyond the initial description that this repository contains mid-training related checkpoints in the extrapolation tasks.
π Overview
Our paper builds a fully controlled experimental framework to analyze how pre-training, mid-training, and RL-based post-training jointly shape the reasoning abilities of language models. Using synthetic math-style reasoning tasks with explicit atomic operations and process-verifiable reasoning traces, we study:
- Extrapolative generalization to more complex compositions (deeper dependency graphs).
- Contextual generalization across diverse surface forms and linguistic contexts.
- How RL interacts with prior knowledge, and when it yields genuine capability gains beyond pre-training.
π§ Key findings
Code
The code and data for this work will be released soon at the following GitHub repository: https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning
π Citation
If you find this work or code useful, please consider citing:
@misc{zhang2025interplaypretrainingmidtrainingrl,
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
author={Charlie Zhang and Graham Neubig and Xiang Yue},
year={2025},
eprint={2512.07783},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.07783},
}