qwen3_4_20250818_1941
This model is a fine-tuned version of Qwen/Qwen3-4B on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.3340
- Map@3: 0.9375
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 3
Training results
| Training Loss | Epoch | Step | Validation Loss | Map@3 |
|---|---|---|---|---|
| 16.5429 | 0.0523 | 20 | 1.4557 | 0.7283 |
| 9.2865 | 0.1046 | 40 | 0.9736 | 0.8026 |
| 8.3639 | 0.1569 | 60 | 1.0031 | 0.7932 |
| 7.1456 | 0.2092 | 80 | 0.7150 | 0.8585 |
| 6.1949 | 0.2615 | 100 | 0.6272 | 0.8776 |
| 5.3446 | 0.3138 | 120 | 0.6454 | 0.8768 |
| 4.9297 | 0.3661 | 140 | 0.6001 | 0.8850 |
| 4.2539 | 0.4184 | 160 | 0.6017 | 0.8870 |
| 4.9359 | 0.4707 | 180 | 0.5601 | 0.8877 |
| 4.0852 | 0.5230 | 200 | 0.5453 | 0.8985 |
| 4.2137 | 0.5754 | 220 | 0.4796 | 0.9097 |
| 4.1494 | 0.6277 | 240 | 0.4894 | 0.9105 |
| 4.1857 | 0.6800 | 260 | 0.4618 | 0.9078 |
| 3.5215 | 0.7323 | 280 | 0.4672 | 0.9093 |
| 4.2297 | 0.7846 | 300 | 0.4450 | 0.9139 |
| 3.2632 | 0.8369 | 320 | 0.4476 | 0.9171 |
| 4.0446 | 0.8892 | 340 | 0.4467 | 0.9141 |
| 3.4267 | 0.9415 | 360 | 0.4137 | 0.9207 |
| 3.4374 | 0.9938 | 380 | 0.4655 | 0.9113 |
| 3.1897 | 1.0445 | 400 | 0.4886 | 0.9167 |
| 2.413 | 1.0968 | 420 | 0.4331 | 0.9232 |
| 2.7002 | 1.1491 | 440 | 0.4092 | 0.9242 |
| 2.7209 | 1.2014 | 460 | 0.3857 | 0.9278 |
| 2.6897 | 1.2537 | 480 | 0.4045 | 0.9260 |
| 2.3799 | 1.3060 | 500 | 0.3872 | 0.9310 |
| 2.7859 | 1.3583 | 520 | 0.4151 | 0.9229 |
| 2.6904 | 1.4106 | 540 | 0.3789 | 0.9313 |
| 2.4114 | 1.4629 | 560 | 0.3901 | 0.9302 |
| 2.6539 | 1.5152 | 580 | 0.3838 | 0.9330 |
| 2.4441 | 1.5675 | 600 | 0.3571 | 0.9348 |
| 2.086 | 1.6198 | 620 | 0.3667 | 0.9341 |
| 2.0958 | 1.6721 | 640 | 0.3498 | 0.9375 |
| 2.3942 | 1.7244 | 660 | 0.3753 | 0.9288 |
| 2.7639 | 1.7767 | 680 | 0.3384 | 0.9377 |
| 2.2673 | 1.8290 | 700 | 0.3267 | 0.9380 |
| 2.2347 | 1.8813 | 720 | 0.3378 | 0.9371 |
| 2.1848 | 1.9336 | 740 | 0.3271 | 0.9376 |
| 2.1091 | 1.9859 | 760 | 0.3330 | 0.9369 |
| 1.8355 | 2.0366 | 780 | 0.3340 | 0.9375 |
Framework versions
- PEFT 0.17.0
- Transformers 4.55.2
- Pytorch 2.6.0+cu124
- Datasets 4.0.0
- Tokenizers 0.21.4
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support