qwen3_4_20250815_2115
This model is a fine-tuned version of Qwen/Qwen3-4B on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.3617
- Map@3: 0.9362
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 3
Training results
| Training Loss | Epoch | Step | Validation Loss | Map@3 |
|---|---|---|---|---|
| 17.1482 | 0.0598 | 20 | 1.3088 | 0.7335 |
| 9.1505 | 0.1196 | 40 | 0.9723 | 0.7878 |
| 7.2646 | 0.1794 | 60 | 0.9710 | 0.8080 |
| 6.4976 | 0.2392 | 80 | 0.6940 | 0.8587 |
| 5.1933 | 0.2990 | 100 | 0.7117 | 0.8664 |
| 5.466 | 0.3587 | 120 | 0.6600 | 0.8688 |
| 4.6808 | 0.4185 | 140 | 0.5713 | 0.8862 |
| 4.4319 | 0.4783 | 160 | 0.5451 | 0.8912 |
| 4.3587 | 0.5381 | 180 | 0.5540 | 0.8920 |
| 4.4524 | 0.5979 | 200 | 0.5633 | 0.8931 |
| 4.1656 | 0.6577 | 220 | 0.5040 | 0.9027 |
| 4.3246 | 0.7175 | 240 | 0.4917 | 0.9072 |
| 4.2143 | 0.7773 | 260 | 0.4637 | 0.9132 |
| 3.5913 | 0.8371 | 280 | 0.4997 | 0.9020 |
| 4.1479 | 0.8969 | 300 | 0.4435 | 0.9169 |
| 3.8664 | 0.9567 | 320 | 0.4353 | 0.9184 |
| 3.511 | 1.0149 | 340 | 0.4341 | 0.9164 |
| 2.6883 | 1.0747 | 360 | 0.4209 | 0.9202 |
| 2.6956 | 1.1345 | 380 | 0.4466 | 0.9130 |
| 3.4252 | 1.1943 | 400 | 0.4149 | 0.9210 |
| 2.542 | 1.2541 | 420 | 0.4202 | 0.9195 |
| 2.7799 | 1.3139 | 440 | 0.4156 | 0.9199 |
| 2.6821 | 1.3737 | 460 | 0.3919 | 0.9260 |
| 2.3878 | 1.4335 | 480 | 0.4012 | 0.9217 |
| 2.4443 | 1.4933 | 500 | 0.3820 | 0.9249 |
| 2.5027 | 1.5531 | 520 | 0.3827 | 0.9251 |
| 2.3072 | 1.6129 | 540 | 0.3762 | 0.9306 |
| 2.6044 | 1.6726 | 560 | 0.3582 | 0.9309 |
| 2.5267 | 1.7324 | 580 | 0.3578 | 0.9313 |
| 2.4106 | 1.7922 | 600 | 0.3753 | 0.9306 |
| 2.7612 | 1.8520 | 620 | 0.3439 | 0.9328 |
| 2.2199 | 1.9118 | 640 | 0.3474 | 0.9345 |
| 2.1533 | 1.9716 | 660 | 0.3498 | 0.9371 |
| 2.2909 | 2.0299 | 680 | 0.3340 | 0.9354 |
| 1.4412 | 2.0897 | 700 | 0.3595 | 0.9339 |
| 1.332 | 2.1495 | 720 | 0.3803 | 0.9364 |
| 1.3479 | 2.2093 | 740 | 0.3617 | 0.9362 |
Framework versions
- PEFT 0.17.0
- Transformers 4.55.2
- Pytorch 2.6.0+cu124
- Datasets 4.0.0
- Tokenizers 0.21.4
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support