RENT method

RENT Model Details

A model trained using RENT: Reinforcement Learning via Entropy Minimization - an unsupervised RL method that requires no external rewards or ground-truth labels. See our github repo and paper for more info on how this model was trained.

The base model used is Qwen2.5-7B-Instruct.

This model was trained using the aime dataset.

When evaluating this model and the base model on AIME (64 runs on each model), we achieve the following results:

Qwen2.5-7B-Instruct RENT-Qwen-7B
0.110 +/- 0.004 (std: 0.035) 0.232 +/- 0.003 (std: 0.024)

(Note that we report the mean and stderr of the 64 scores the model achieves on AIME)

This checkpoint has not been trained, evaluated, or tested on any other dataset.

Downloads last month
20
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aippolit/RENT-Qwen-7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(2129)
this model
Quantizations
2 models

Dataset used to train aippolit/RENT-Qwen-7B