Built with Axolotl

See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: fxmarty/tiny-dummy-qwen2
bf16: true
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - 7b7d4c2e25bfcfaa_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/7b7d4c2e25bfcfaa_train_data.json
  type:
    field_input: span_labels
    field_instruction: source_text
    field_output: target_text
    format: '{instruction} {input}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
device_map:
  ? ''
  : 0,1,2,3,4,5,6,7
early_stopping_patience: 2
eval_max_new_tokens: 128
eval_steps: 400
eval_table_size: null
flash_attention: true
gradient_accumulation_steps: 4
gradient_checkpointing: true
group_by_length: false
hub_model_id: Alphatao/06162734-d351-41bb-82e7-47e4efe3d9c9
hub_repo: null
hub_strategy: null
hub_token: null
learning_rate: 0.0002
load_best_model_at_end: true
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 32
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 16
lora_target_linear: true
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: 142749
micro_batch_size: 2
mlflow_experiment_name: /tmp/7b7d4c2e25bfcfaa_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 2
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 400
sequence_len: 1024
strict: false
tf32: true
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.022261303176687963
wandb_entity: null
wandb_mode: online
wandb_name: b4837e4a-e96f-46cd-948c-e22b70f0c278
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: b4837e4a-e96f-46cd-948c-e22b70f0c278
warmup_steps: 10
weight_decay: 0.0
xformers_attention: null

06162734-d351-41bb-82e7-47e4efe3d9c9

This model is a fine-tuned version of fxmarty/tiny-dummy-qwen2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 11.8945

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • training_steps: 54902

Training results

Training Loss Epoch Step Validation Loss
11.931 0.0000 1 11.9320
11.908 0.0146 400 11.9114
11.914 0.0291 800 11.9078
11.9067 0.0437 1200 11.9058
11.9008 0.0583 1600 11.9040
11.911 0.0729 2000 11.9033
11.9088 0.0874 2400 11.9021
11.9003 0.1020 2800 11.9015
11.9191 0.1166 3200 11.9009
11.8979 0.1311 3600 11.9004
11.8947 0.1457 4000 11.8999
11.9035 0.1603 4400 11.8998
11.9037 0.1749 4800 11.8993
11.8862 0.1894 5200 11.8990
11.8968 0.2040 5600 11.8987
11.8981 0.2186 6000 11.8983
11.9071 0.2331 6400 11.8981
11.8955 0.2477 6800 11.8977
11.9008 0.2623 7200 11.8975
11.8946 0.2769 7600 11.8974
11.8937 0.2914 8000 11.8972
11.8924 0.3060 8400 11.8969
11.8988 0.3206 8800 11.8968
11.9037 0.3351 9200 11.8968
11.8975 0.3497 9600 11.8966
11.906 0.3643 10000 11.8966
11.9076 0.3789 10400 11.8963
11.9063 0.3934 10800 11.8961
11.8964 0.4080 11200 11.8961
11.9075 0.4226 11600 11.8959
11.9032 0.4371 12000 11.8958
11.9125 0.4517 12400 11.8955
11.8889 0.4663 12800 11.8955
11.9002 0.4809 13200 11.8954
11.897 0.4954 13600 11.8953
11.9072 0.5100 14000 11.8952
11.9026 0.5246 14400 11.8951
11.899 0.5391 14800 11.8951
11.9073 0.5537 15200 11.8949
11.9005 0.5683 15600 11.8948
11.8986 0.5829 16000 11.8950
11.8816 0.5974 16400 11.8948
11.8973 0.6120 16800 11.8948
11.8944 0.6266 17200 11.8947
11.8867 0.6411 17600 11.8947
11.884 0.6557 18000 11.8945
11.8971 0.6703 18400 11.8945
11.9007 0.6849 18800 11.8945
11.8896 0.6994 19200 11.8945

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Alphatao/06162734-d351-41bb-82e7-47e4efe3d9c9

Adapter
(174)
this model

Evaluation results