RENT Model Details

A model trained using RENT: Reinforcement Learning via Entropy Minimization - an unsupervised RL method that requires no external rewards or ground-truth labels. See our github repo and paper for more info on how this model was trained.

The base model used is Qwen2.5-7B-Instruct.

This model was trained using the aime dataset.

When evaluating this model and the base model on AIME (64 runs on each model), we achieve the following results:

Qwen2.5-7B-Instruct	RENT-Qwen-7B
0.110 +/- 0.004 (std: 0.035)	0.232 +/- 0.003 (std: 0.024)

(Note that we report the mean and stderr of the 64 scores the model achieves on AIME)

This checkpoint has not been trained, evaluated, or tested on any other dataset.

Downloads last month: 20

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aippolit/RENT-Qwen-7B

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(2129)

this model

Quantizations

2 models

aippolit
/

RENT-Qwen-7B

RENT Model Details

Model tree for aippolit/RENT-Qwen-7B

Dataset used to train aippolit/RENT-Qwen-7B