See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: fxmarty/tiny-dummy-qwen2
bf16: true
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - 7b7d4c2e25bfcfaa_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/7b7d4c2e25bfcfaa_train_data.json
  type:
    field_input: span_labels
    field_instruction: source_text
    field_output: target_text
    format: '{instruction} {input}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
device_map:
  ? ''
  : 0,1,2,3,4,5,6,7
early_stopping_patience: 2
eval_max_new_tokens: 128
eval_steps: 400
eval_table_size: null
flash_attention: true
gradient_accumulation_steps: 4
gradient_checkpointing: true
group_by_length: false
hub_model_id: Alphatao/06162734-d351-41bb-82e7-47e4efe3d9c9
hub_repo: null
hub_strategy: null
hub_token: null
learning_rate: 0.0002
load_best_model_at_end: true
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 32
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 16
lora_target_linear: true
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: 142749
micro_batch_size: 2
mlflow_experiment_name: /tmp/7b7d4c2e25bfcfaa_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 2
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 400
sequence_len: 1024
strict: false
tf32: true
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.022261303176687963
wandb_entity: null
wandb_mode: online
wandb_name: b4837e4a-e96f-46cd-948c-e22b70f0c278
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: b4837e4a-e96f-46cd-948c-e22b70f0c278
warmup_steps: 10
weight_decay: 0.0
xformers_attention: null

06162734-d351-41bb-82e7-47e4efe3d9c9

This model is a fine-tuned version of fxmarty/tiny-dummy-qwen2 on the None dataset. It achieves the following results on the evaluation set:

Loss: 11.8945

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
training_steps: 54902

Training results

Training Loss	Epoch	Step	Validation Loss
11.931	0.0000	1	11.9320
11.908	0.0146	400	11.9114
11.914	0.0291	800	11.9078
11.9067	0.0437	1200	11.9058
11.9008	0.0583	1600	11.9040
11.911	0.0729	2000	11.9033
11.9088	0.0874	2400	11.9021
11.9003	0.1020	2800	11.9015
11.9191	0.1166	3200	11.9009
11.8979	0.1311	3600	11.9004
11.8947	0.1457	4000	11.8999
11.9035	0.1603	4400	11.8998
11.9037	0.1749	4800	11.8993
11.8862	0.1894	5200	11.8990
11.8968	0.2040	5600	11.8987
11.8981	0.2186	6000	11.8983
11.9071	0.2331	6400	11.8981
11.8955	0.2477	6800	11.8977
11.9008	0.2623	7200	11.8975
11.8946	0.2769	7600	11.8974
11.8937	0.2914	8000	11.8972
11.8924	0.3060	8400	11.8969
11.8988	0.3206	8800	11.8968
11.9037	0.3351	9200	11.8968
11.8975	0.3497	9600	11.8966
11.906	0.3643	10000	11.8966
11.9076	0.3789	10400	11.8963
11.9063	0.3934	10800	11.8961
11.8964	0.4080	11200	11.8961
11.9075	0.4226	11600	11.8959
11.9032	0.4371	12000	11.8958
11.9125	0.4517	12400	11.8955
11.8889	0.4663	12800	11.8955
11.9002	0.4809	13200	11.8954
11.897	0.4954	13600	11.8953
11.9072	0.5100	14000	11.8952
11.9026	0.5246	14400	11.8951
11.899	0.5391	14800	11.8951
11.9073	0.5537	15200	11.8949
11.9005	0.5683	15600	11.8948
11.8986	0.5829	16000	11.8950
11.8816	0.5974	16400	11.8948
11.8973	0.6120	16800	11.8948
11.8944	0.6266	17200	11.8947
11.8867	0.6411	17600	11.8947
11.884	0.6557	18000	11.8945
11.8971	0.6703	18400	11.8945
11.9007	0.6849	18800	11.8945
11.8896	0.6994	19200	11.8945

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alphatao/06162734-d351-41bb-82e7-47e4efe3d9c9

Base model

fxmarty/tiny-dummy-qwen2

Adapter

(174)

this model

Alphatao
/

06162734-d351-41bb-82e7-47e4efe3d9c9

06162734-d351-41bb-82e7-47e4efe3d9c9

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Alphatao/06162734-d351-41bb-82e7-47e4efe3d9c9

Evaluation results