Built with Axolotl

See axolotl config

axolotl version: 0.13.0.dev0

base_model: meta-llama/Llama-3.1-8B-Instruct

load_in_8bit: false

load_in_4bit: true

strict: false

adapter: lora

datasets:

  - path: AIPixelMedia/astrid-dataset

    data_files: "*formatted.jsonl"

    type: alpaca

dataset_prepared_path: last_run_prepared

val_set_size: 0.1

output_dir: ./outputs/astrid-llama-8b

sequence_len: 2048

sample_packing: true

eval_sample_packing: false

pad_to_sequence_len: true

flash_attention: true

seed: 35

gradient_accumulation_steps: 4

micro_batch_size: 2

num_epochs: 20

optimizer: paged_adamw_32bit

learning_rate: 2e-5

lr_scheduler: cosine

lora_r: 16

lora_alpha: 32

lora_dropout: 0.2

lora_target_modules:

  - q_proj

  - k_proj

  - v_proj

  - o_proj

  - gate_proj

  - up_proj

  - down_proj

lora_modules_to_save:

  - lm_head

merge_lora: false

save_safetensors: true

train_on_inputs: false

group_by_length: true

bf16: auto

tf32: false

gradient_checkpointing: true

gradient_checkpointing_kwargs:

  use_reentrant: false

eval_steps: 5

save_steps: 100

early_stopping_patience: 2

logging_steps: 5

warmup_steps: 10

weight_decay: 0.01

special_tokens:

  pad_token: "<|end_of_text|>"

outputs/astrid-llama-8b

This model is a fine-tuned version of meta-llama/Llama-3.1-8B-Instruct on the AIPixelMedia/astrid-dataset dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3918
  • Memory/max Active (gib): 12.13
  • Memory/max Allocated (gib): 12.13
  • Memory/device Reserved (gib): 16.52

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 35
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • training_steps: 40

Training results

Training Loss Epoch Step Validation Loss Active (gib) Allocated (gib) Reserved (gib)
No log 0 0 3.2546 11.95 11.95 12.15
3.1725 1.8889 5 3.2149 12.13 12.13 17.46
3.1171 3.4444 10 2.9429 12.13 12.13 16.52
2.6355 5.0 15 2.6398 12.13 12.13 16.52
2.3752 6.8889 20 2.5206 12.13 12.13 16.52
2.1869 8.4444 25 2.4464 12.13 12.13 16.52
2.0751 10.0 30 2.4187 12.13 12.13 16.52
2.0616 11.8889 35 2.4084 12.13 12.13 16.52
2.0263 13.4444 40 2.3918 12.13 12.13 16.52

Framework versions

  • PEFT 0.17.1
  • Transformers 4.57.0
  • Pytorch 2.7.1+cu126
  • Datasets 4.0.0
  • Tokenizers 0.22.1
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AIPixelMedia/astrid

Adapter
(1340)
this model

Evaluation results