Hi, are you using Warmup-Stable and Merge strategy for training this model, even in the RLVR stage?
We adopt WSM LR scheduler in the pre-training stage of Ling-1T-base, not in RLVR stage.
Β· Sign up or log in to comment