qwen3-4b-instruct-2507-3epochs-non-ood

This model is a fine-tuned version of Qwen/Qwen3-4B-Instruct-2507 on the deepcoder_partial_edits_non_ood_train dataset. It achieves the following results on the evaluation set:

Loss: 0.0361

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.0624	0.4779	100	0.0417
0.0461	0.9558	200	0.0352
0.0368	1.4301	300	0.0338
0.0429	1.9080	400	0.0323
0.0179	2.3823	500	0.0359
0.0204	2.8602	600	0.0360

Framework versions

Transformers 4.55.0
Pytorch 2.6.0+cu124
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: 16

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for nreHieW/qwen3-4b-instruct-2507-3epochs-non-ood

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(165)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard