On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Charlie Zhang, Graham Neubig, Xiang Yue

Carnegie Mellon University, Language Technologies Institute

arXiv License: MIT Python

Does Reinforcement Learning Truly Extend Reasoning?

This work explores the discrepancy in views on RL's effectiveness in extending language models' reasoning abilities. Some characterize RL as a capability refiner, while others see it as inducing new compositional skills. This challenge stems from a lack of control in modern training pipelines. Our work aims to resolve this conflict through controlled analysis, going beyond the initial description that this repository contains mid-training related checkpoints in the extrapolation tasks.

πŸ” Overview

Our paper builds a fully controlled experimental framework to analyze how pre-training, mid-training, and RL-based post-training jointly shape the reasoning abilities of language models. Using synthetic math-style reasoning tasks with explicit atomic operations and process-verifiable reasoning traces, we study:

  • Extrapolative generalization to more complex compositions (deeper dependency graphs).
  • Contextual generalization across diverse surface forms and linguistic contexts.
  • How RL interacts with prior knowledge, and when it yields genuine capability gains beyond pre-training.

🧠 Key findings

You may also find the comic generated by Notebook LLM [here](assets/Interplay-LM-Reasoning.pdf).

Code

The code and data for this work will be released soon at the following GitHub repository: https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning

πŸ“š Citation

If you find this work or code useful, please consider citing:

@misc{zhang2025interplaypretrainingmidtrainingrl,
      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models}, 
      author={Charlie Zhang and Graham Neubig and Xiang Yue},
      year={2025},
      eprint={2512.07783},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.07783}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support