Built with Axolotl

See axolotl config

axolotl version: 0.13.0.dev0

base_model: Qwen/Qwen2.5-Coder-7B-Instruct
# Auto-upload to HuggingFace when done
hub_model_id: darwinkernelpanic/Qwen2.5-Coder-7B-Instruct-Luau  # Change this to your HF username
hub_strategy: every_save  # Uploads checkpoints as you train
trust_remote_code: true

load_in_8bit: false
load_in_4bit: true

datasets:
  - path: darwinkernelpanic/luau_corpus_axolotl
    type: completion
    field_instruction: text  # Check the actual column names on HF
    field_output: completion   # Might be "text" or "code" — verify first

dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/qwen-luau-finetune

sequence_len: 2048
sample_packing: true
eval_sample_packing: true

adapter: qlora
lora_model_dir:
lora_r: 64
lora_alpha: 64
lora_dropout: 0.05
lora_target_linear: true

# Weights & Biases tracking (optional but clutch)
wandb_project: qwen-luau-finetune
wandb_entity:
wandb_watch:
wandb_name: qwen2.5-coder-7b-luau
wandb_log_model:

gradient_accumulation_steps: 2
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.0003
bf16: auto
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false

resume_from_checkpoint:
logging_steps: 10
flash_attention: true
warmup_ratio: 0.1
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.01

fsdp:
  - full_shard
  - auto_wrap
fsdp_config:
  fsdp_limit_all_gathers: true
  fsdp_sync_module_states: true
  fsdp_offload_params: false
  fsdp_use_orig_params: false
  fsdp_cpu_ram_efficient_loading: true
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_state_dict_type: FULL_STATE_DICT

special_tokens:
  pad_token: "<|endoftext|>"

Qwen2.5-Coder-7B-Instruct-Luau

This model is a fine-tuned version of Qwen/Qwen2.5-Coder-7B-Instruct on the darwinkernelpanic/luau_corpus_axolotl dataset. It achieves the following results on the evaluation set:

  • Loss: nan
  • Ppl: nan
  • Memory/max Active (gib): 14.12
  • Memory/max Allocated (gib): 14.01
  • Memory/device Reserved (gib): 14.69

Model description

The model was fine-tuned on the Roblox/luau_corpus dataset which was converted to have the "prompt" collum replaced by "text" for compatibility reasons. It was fine-tuned for improved knowledge and performance on Luau code (Roblox's Lua dialect, see luau.org), which should end up improving code quality for Luau and Roblox projects.

Intended uses & limitations

This model is intended for use within applications that use the Luau programming language, including but not limited to

  • Roblox projects
  • Standalone Luau projects (Lune?)

It may have limitations for projects that

  • Use alternative languages
  • Use Lua
  • Non programming related projects

Training and evaluation data

N/A

Training procedure

Trained on 2x NVIDIA RTX 4090s

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • total_eval_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • training_steps: 105

Training results

Training Loss Epoch Step Validation Loss Ppl Active (gib) Allocated (gib) Reserved (gib)
No log 0 0 3.9969 54.428 11.21 11.1 12.26
No log 0.2535 9 nan nan 14.12 14.01 15.56
12.4054 0.5070 18 nan nan 14.12 14.01 14.69
0.0 0.7606 27 nan nan 14.12 14.01 14.69
0.0 1.0 36 nan nan 14.12 14.01 14.69
0.0 1.2535 45 nan nan 14.12 14.01 14.69
0.0 1.5070 54 nan nan 14.12 14.01 14.69
0.0 1.7606 63 nan nan 14.12 14.01 14.69
0.0 2.0 72 nan nan 14.12 14.01 14.69
0.0 2.2535 81 nan nan 14.12 14.01 14.69
0.0 2.5070 90 nan nan 11.83 11.72 14.69
0.0 2.7606 99 nan nan 14.12 14.01 14.69

Framework versions

  • PEFT 0.18.0
  • Transformers 4.57.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.4.1
  • Tokenizers 0.22.1
Downloads last month
35
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darwinkernelpanic/Qwen2.5-Coder-7B-Instruct-Luau

Base model

Qwen/Qwen2.5-7B
Adapter
(358)
this model
Adapters
1 model

Dataset used to train darwinkernelpanic/Qwen2.5-Coder-7B-Instruct-Luau

Evaluation results