RLCR
Collection
Collection of models and datasets for Beyond Binary Rewards: Training LMs to Reason about their Uncertainty
•
10 items
•
Updated
•
5
This model is a fine-tuned version of Qwen/Qwen2.5-7B on an unknown dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|---|---|---|---|---|
| 0.6944 | 0.08 | 25 | 0.6598 | 0.584 |
| 0.7264 | 0.16 | 50 | 0.7157 | 0.568 |
| 0.6153 | 0.24 | 75 | 0.6331 | 0.642 |
| 0.6774 | 0.32 | 100 | 0.5896 | 0.658 |
| 0.5579 | 0.4 | 125 | 0.5685 | 0.694 |
| 0.5981 | 0.48 | 150 | 0.5802 | 0.702 |
| 0.5214 | 0.56 | 175 | 0.6180 | 0.688 |
| 0.5363 | 0.64 | 200 | 0.6699 | 0.668 |
| 0.5309 | 0.72 | 225 | 0.5933 | 0.694 |
| 0.5541 | 0.8 | 250 | 0.5949 | 0.7 |
| 0.5562 | 0.88 | 275 | 0.5978 | 0.698 |
| 0.5539 | 0.96 | 300 | 0.5965 | 0.7 |
Base model
Qwen/Qwen2.5-7B