qwen3_4_20250812_2132
This model is a fine-tuned version of Qwen/Qwen3-4B on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.3653
- Map@3: 0.9397
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 3
Training results
| Training Loss | Epoch | Step | Validation Loss | Map@3 |
|---|---|---|---|---|
| 23.3553 | 0.0598 | 20 | 2.0405 | 0.5869 |
| 14.0817 | 0.1196 | 40 | 1.1413 | 0.7757 |
| 7.9694 | 0.1794 | 60 | 0.8782 | 0.8333 |
| 6.5759 | 0.2392 | 80 | 0.7345 | 0.8571 |
| 5.4951 | 0.2990 | 100 | 0.6525 | 0.8714 |
| 4.9341 | 0.3587 | 120 | 0.6328 | 0.8735 |
| 4.3622 | 0.4185 | 140 | 0.5057 | 0.9034 |
| 3.8179 | 0.4783 | 160 | 0.5399 | 0.8955 |
| 4.2917 | 0.5381 | 180 | 0.4924 | 0.9077 |
| 4.0504 | 0.5979 | 200 | 0.4972 | 0.9057 |
| 3.6507 | 0.6577 | 220 | 0.4539 | 0.9115 |
| 3.9974 | 0.7175 | 240 | 0.4652 | 0.9110 |
| 3.7596 | 0.7773 | 260 | 0.4360 | 0.9201 |
| 3.3848 | 0.8371 | 280 | 0.4552 | 0.9123 |
| 3.6286 | 0.8969 | 300 | 0.4211 | 0.9188 |
| 3.5085 | 0.9567 | 320 | 0.4150 | 0.9206 |
| 3.1587 | 1.0149 | 340 | 0.3916 | 0.9217 |
| 2.5177 | 1.0747 | 360 | 0.4076 | 0.9225 |
| 2.3485 | 1.1345 | 380 | 0.3830 | 0.9269 |
| 3.0193 | 1.1943 | 400 | 0.4031 | 0.9215 |
| 2.2161 | 1.2541 | 420 | 0.3999 | 0.9281 |
| 2.5193 | 1.3139 | 440 | 0.3834 | 0.9279 |
| 2.5606 | 1.3737 | 460 | 0.3715 | 0.9297 |
| 2.3023 | 1.4335 | 480 | 0.3678 | 0.9298 |
| 2.2246 | 1.4933 | 500 | 0.3815 | 0.9307 |
| 2.2925 | 1.5531 | 520 | 0.3624 | 0.9299 |
| 2.1651 | 1.6129 | 540 | 0.3689 | 0.9319 |
| 2.2626 | 1.6726 | 560 | 0.3705 | 0.9290 |
| 2.2447 | 1.7324 | 580 | 0.3517 | 0.9330 |
| 2.4377 | 1.7922 | 600 | 0.3537 | 0.9333 |
| 2.5225 | 1.8520 | 620 | 0.3296 | 0.9363 |
| 2.048 | 1.9118 | 640 | 0.3334 | 0.9367 |
| 1.9818 | 1.9716 | 660 | 0.3481 | 0.9368 |
| 1.988 | 2.0299 | 680 | 0.3221 | 0.9397 |
| 1.3347 | 2.0897 | 700 | 0.3506 | 0.9387 |
| 1.1025 | 2.1495 | 720 | 0.3576 | 0.9392 |
| 1.2089 | 2.2093 | 740 | 0.3518 | 0.9397 |
| 1.2152 | 2.2691 | 760 | 0.3576 | 0.9404 |
| 1.1314 | 2.3288 | 780 | 0.3620 | 0.9382 |
| 1.1089 | 2.3886 | 800 | 0.3452 | 0.9397 |
| 1.0984 | 2.4484 | 820 | 0.3630 | 0.9396 |
| 1.0739 | 2.5082 | 840 | 0.3506 | 0.9406 |
| 0.8933 | 2.5680 | 860 | 0.3614 | 0.9405 |
| 0.8392 | 2.6278 | 880 | 0.3702 | 0.9399 |
| 1.2028 | 2.6876 | 900 | 0.3655 | 0.9401 |
| 0.8721 | 2.7474 | 920 | 0.3653 | 0.9397 |
Framework versions
- PEFT 0.17.0
- Transformers 4.55.0
- Pytorch 2.6.0+cu124
- Datasets 4.0.0
- Tokenizers 0.21.4
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support