error577 commited on
Commit
efe542f
·
verified ·
1 Parent(s): 21bc074

End of training

Browse files
Files changed (2) hide show
  1. README.md +17 -17
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -66,7 +66,7 @@ lora_r: 8
66
  lora_target_linear: true
67
  lr_scheduler: cosine
68
  max_steps: 10
69
- micro_batch_size: 1
70
  mlflow_experiment_name: /tmp/db70e57b42c5fff2_train_data.json
71
  model_type: AutoModelForCausalLM
72
  num_epochs: 4
@@ -92,7 +92,7 @@ wandb_name: 3cfb546e-4735-4a27-9a50-f4d92ba35258
92
  wandb_project: Gradients-On-Demand
93
  wandb_run: your_name
94
  wandb_runid: 3cfb546e-4735-4a27-9a50-f4d92ba35258
95
- warmup_steps: 5
96
  weight_decay: 0.0
97
  xformers_attention: null
98
 
@@ -104,7 +104,7 @@ xformers_attention: null
104
 
105
  This model is a fine-tuned version of [NousResearch/CodeLlama-7b-hf-flash](https://huggingface.co/NousResearch/CodeLlama-7b-hf-flash) on the None dataset.
106
  It achieves the following results on the evaluation set:
107
- - Loss: 0.3627
108
 
109
  ## Model description
110
 
@@ -124,30 +124,30 @@ More information needed
124
 
125
  The following hyperparameters were used during training:
126
  - learning_rate: 0.0002
127
- - train_batch_size: 1
128
- - eval_batch_size: 1
129
  - seed: 42
130
  - gradient_accumulation_steps: 16
131
- - total_train_batch_size: 16
132
  - optimizer: Use OptimizerNames.ADAMW_TORCH_4BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
133
  - lr_scheduler_type: cosine
134
- - lr_scheduler_warmup_steps: 5
135
  - training_steps: 10
136
 
137
  ### Training results
138
 
139
  | Training Loss | Epoch | Step | Validation Loss |
140
  |:-------------:|:------:|:----:|:---------------:|
141
- | 14.6764 | 0.0048 | 1 | 1.0043 |
142
- | 16.0244 | 0.0096 | 2 | 0.9961 |
143
- | 17.5417 | 0.0144 | 3 | 0.9555 |
144
- | 17.311 | 0.0192 | 4 | 0.8481 |
145
- | 12.0214 | 0.0240 | 5 | 0.6824 |
146
- | 13.3294 | 0.0288 | 6 | 0.5323 |
147
- | 8.38 | 0.0337 | 7 | 0.4619 |
148
- | 5.5358 | 0.0385 | 8 | 0.3950 |
149
- | 6.8075 | 0.0433 | 9 | 0.3684 |
150
- | 5.2666 | 0.0481 | 10 | 0.3627 |
151
 
152
 
153
  ### Framework versions
 
66
  lora_target_linear: true
67
  lr_scheduler: cosine
68
  max_steps: 10
69
+ micro_batch_size: 2
70
  mlflow_experiment_name: /tmp/db70e57b42c5fff2_train_data.json
71
  model_type: AutoModelForCausalLM
72
  num_epochs: 4
 
92
  wandb_project: Gradients-On-Demand
93
  wandb_run: your_name
94
  wandb_runid: 3cfb546e-4735-4a27-9a50-f4d92ba35258
95
+ warmup_steps: 10
96
  weight_decay: 0.0
97
  xformers_attention: null
98
 
 
104
 
105
  This model is a fine-tuned version of [NousResearch/CodeLlama-7b-hf-flash](https://huggingface.co/NousResearch/CodeLlama-7b-hf-flash) on the None dataset.
106
  It achieves the following results on the evaluation set:
107
+ - Loss: 0.4006
108
 
109
  ## Model description
110
 
 
124
 
125
  The following hyperparameters were used during training:
126
  - learning_rate: 0.0002
127
+ - train_batch_size: 2
128
+ - eval_batch_size: 2
129
  - seed: 42
130
  - gradient_accumulation_steps: 16
131
+ - total_train_batch_size: 32
132
  - optimizer: Use OptimizerNames.ADAMW_TORCH_4BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
133
  - lr_scheduler_type: cosine
134
+ - lr_scheduler_warmup_steps: 10
135
  - training_steps: 10
136
 
137
  ### Training results
138
 
139
  | Training Loss | Epoch | Step | Validation Loss |
140
  |:-------------:|:------:|:----:|:---------------:|
141
+ | 14.6966 | 0.0096 | 1 | 0.9651 |
142
+ | 17.2403 | 0.0192 | 2 | 0.9630 |
143
+ | 15.8582 | 0.0288 | 3 | 0.9526 |
144
+ | 14.0816 | 0.0385 | 4 | 0.9209 |
145
+ | 14.2308 | 0.0481 | 5 | 0.8555 |
146
+ | 14.0635 | 0.0577 | 6 | 0.7542 |
147
+ | 12.24 | 0.0673 | 7 | 0.6446 |
148
+ | 9.584 | 0.0769 | 8 | 0.5435 |
149
+ | 7.7745 | 0.0865 | 9 | 0.4615 |
150
+ | 5.8718 | 0.0962 | 10 | 0.4006 |
151
 
152
 
153
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:52a9004528fef0fedc5a7c55c1ce2548f6b138ffd8ca89a686ff9fcda8cb84b3
3
  size 80115210
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:72948994e561007d62010d55858b35ff4eb46ae5275169f7fd80b33c4aeb1f29
3
  size 80115210