RENT Model Details
A model trained using RENT: Reinforcement Learning via Entropy Minimization - an unsupervised RL method that requires no external rewards or ground-truth labels. See our github repo and paper for more info on how this model was trained.
The base model used is Qwen2.5-7B-Instruct.
This model was trained using the aime dataset.
When evaluating this model and the base model on AIME (64 runs on each model), we achieve the following results:
| Qwen2.5-7B-Instruct | RENT-Qwen-7B |
|---|---|
| 0.110 +/- 0.004 (std: 0.035) | 0.232 +/- 0.003 (std: 0.024) |
(Note that we report the mean and stderr of the 64 scores the model achieves on AIME)
This checkpoint has not been trained, evaluated, or tested on any other dataset.
- Downloads last month
- 20
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
