import unsloth from unsloth import FastLanguageModel import torch max_seq_length = 2048 dtype = None load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Phi-3-medium-4k-instruct", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, )

model = FastLanguageModel.get_peft_model( model, r = 32, # Increased from 16 for better adaptation target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], lora_alpha = 32, # Increased from 8 (1:1 ratio with r) lora_dropout = 0.4, # Increased from 0.3 bias = "none", use_gradient_checkpointing = "unsloth", random_state = 3407, )

wandb.init(project="Phi-3-CEFR-finetuning", config={ "model": "unsloth/Phi-3-medium-4k-instruct", "strategy": "gradient_checkpointing", "learning_rate": 2e-5, "batch_size": 8, # Increased since we'll remove accumulation })

2. Modified Training Arguments

trainer = SFTTrainer( model = model, tokenizer = tokenizer, train_dataset = train_dataset_transformed.shuffle(seed=3407), eval_dataset = val_dataset_transformed, dataset_text_field = "text", max_seq_length = max_seq_length, args = TrainingArguments( per_device_train_batch_size = 8, # Doubled from 4 (removes need for accumulation) gradient_accumulation_steps = 1, # Set to 1 to avoid the warning warmup_ratio = 0.1, num_train_epochs = 2, learning_rate = 2e-5, fp16 = not is_bfloat16_supported(), bf16 = is_bfloat16_supported(), logging_steps = 50, optim = "adamw_8bit", weight_decay = 0.2, lr_scheduler_type = "cosine", eval_strategy = "steps", eval_steps = 200, save_strategy = "steps", save_steps = 200, output_dir = "outputs", load_best_model_at_end = True, metric_for_best_model = "eval_loss", greater_is_better = False, seed = 3407, report_to = "wandb", run_name = "phi3-cefr-lora-v2", gradient_checkpointing = True, # Added to compensate for removed accumulation ), )

Step Training Loss Validation Loss

200 1.559500 1.493544

400 1.470200 1.433533

600 1.384500 1.420088

800 1.405300 1.403799

1000 1.390000 1.403095

1200 1.330400 1.406791

1400 1.295400 1.404276

Phi-3 CEFR Fine-tuned Model

Fine-tuned for CEFR level classification with:

  • Epochs: 2
  • Learning rate: 2e-5
  • LoRA rank: 32

Evaluation results: {'eval_loss': 1.4030952453613281, 'eval_runtime': 71.1503, 'eval_samples_per_second': 38.398, 'eval_steps_per_second': 4.807, 'epoch': 0.7799442896935933}

=== Sample Generation ===

--- Example 1 --- [Prompt]: <|user|> Generate a CEFR A1 level sentence.<|end|> [Generated]: Generate a CEFR A1 level sentence. The movie was released on DVD on 15 October 2013 . [Expected]: Do you need something to eat ?

--- Example 2 --- [Prompt]: <|user|> Generate a CEFR A1 level sentence.<|end|> [Generated]: Generate a CEFR A1 level sentence. The movie was released on DVD on 15 October 2013 . [Expected]: Do you need something to eat ?

--- Example 3 --- [Prompt]: <|user|> Generate a CEFR A1 level sentence.<|end|> [Generated]: Generate a CEFR A1 level sentence. The movie was released on DVD on 15 October 2013 . [Expected]: I have two fish in a bowl .

--- Example 4 --- [Prompt]: <|user|> Generate a CEFR A1 level sentence.<|end|> [Generated]: Generate a CEFR A1 level sentence. The movie was released on DVD on 15 October 2013 . [Expected]: I have two fish in a bowl .

--- Example 5 --- [Prompt]: <|user|> Generate a CEFR A1 level sentence.<|end|> [Generated]: Generate a CEFR A1 level sentence. The movie was released on DVD on 15 October 2013 . [Expected]: She wanted the grey coat .

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Mr-FineTuner/Phi-3-medium-4k-instruct_2Epoch_ReductedPrompt_v2

Adapter
(164)
this model