File size: 14,690 Bytes
f079942
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7dfda57
 
 
 
 
 
 
 
 
72d96c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56a00bf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
04/02/2025 07:09:10 - INFO  - Train data file: finetuning_data_25_sentences.json
04/02/2025 07:09:10 - INFO  - Output Directory: output/gemma-3-27b-pt/20250402_070854_gemma-3-27b-pt_LoRA
04/02/2025 07:09:10 - INFO  - Experiment name: KETI_b1_s4_e3
04/02/2025 07:09:10 - INFO  - torch_dtype: torch.bfloat16
04/02/2025 07:09:10 - INFO  - πŸ” Start inference on base model: google/gemma-3-27b-it
04/02/2025 07:10:10 - INFO  - βœ… base_modelκ³Ό tokenizer λ©”λͺ¨λ¦¬μ—μ„œ ν•΄μ œ μ™„λ£Œ
04/02/2025 07:10:10 - INFO  - Using 6 GPU(s): NVIDIA A100-SXM4-80GB
04/02/2025 07:10:11 - INFO  - πŸ”’ Training samples: 37822
04/02/2025 07:10:11 - INFO  - πŸ” Evaluation samples: 4203
04/02/2025 07:10:11 - INFO  - πŸ“Š Steps per epoch: 1575
04/02/2025 07:10:11 - INFO  - πŸͺœ Total training steps: 4725
04/02/2025 07:10:11 - INFO  - βœ… FFT or LoRA λͺ¨λ“œλ‘œ ν•™μŠ΅ν•©λ‹ˆλ‹€.
04/02/2025 07:10:24 - INFO  - Initializing LORA model...
04/02/2025 07:10:27 - INFO  - πŸ“Œ LoRA Configuration:
04/02/2025 07:10:27 - INFO  -   - task_type: CAUSAL_LM
04/02/2025 07:10:27 - INFO  -   - peft_type: PeftType.LORA
04/02/2025 07:10:27 - INFO  -   - auto_mapping: None
04/02/2025 07:10:27 - INFO  -   - base_model_name_or_path: google/gemma-3-27b-pt
04/02/2025 07:10:27 - INFO  -   - revision: None
04/02/2025 07:10:27 - INFO  -   - inference_mode: False
04/02/2025 07:10:27 - INFO  -   - r: 32
04/02/2025 07:10:27 - INFO  -   - target_modules: {'q_proj', 'v_proj', 'k_proj', 'gate_proj', 'down_proj', 'up_proj', 'o_proj'}
04/02/2025 07:10:27 - INFO  -   - exclude_modules: None
04/02/2025 07:10:27 - INFO  -   - lora_alpha: 16
04/02/2025 07:10:27 - INFO  -   - lora_dropout: 0.05
04/02/2025 07:10:27 - INFO  -   - fan_in_fan_out: False
04/02/2025 07:10:27 - INFO  -   - bias: none
04/02/2025 07:10:27 - INFO  -   - use_rslora: False
04/02/2025 07:10:27 - INFO  -   - modules_to_save: None
04/02/2025 07:10:27 - INFO  -   - init_lora_weights: True
04/02/2025 07:10:27 - INFO  -   - layers_to_transform: None
04/02/2025 07:10:27 - INFO  -   - layers_pattern: None
04/02/2025 07:10:27 - INFO  -   - rank_pattern: {}
04/02/2025 07:10:27 - INFO  -   - alpha_pattern: {}
04/02/2025 07:10:27 - INFO  -   - megatron_config: None
04/02/2025 07:10:27 - INFO  -   - megatron_core: megatron.core
04/02/2025 07:10:27 - INFO  -   - trainable_token_indices: None
04/02/2025 07:10:27 - INFO  -   - loftq_config: {}
04/02/2025 07:10:27 - INFO  -   - eva_config: None
04/02/2025 07:10:27 - INFO  -   - corda_config: None
04/02/2025 07:10:27 - INFO  -   - use_dora: False
04/02/2025 07:10:27 - INFO  -   - layer_replication: None
04/02/2025 07:10:27 - INFO  -   - lora_bias: False
04/02/2025 07:10:27 - INFO  - 🧠 Trainable params: 227033088 / 27236379392 (0.83%)
04/02/2025 07:10:27 - INFO  - πŸ“Œ SFT Configuration:
04/02/2025 07:10:27 - INFO  -   - output_dir: output/gemma-3-27b-pt/20250402_070854_gemma-3-27b-pt_LoRA
04/02/2025 07:10:27 - INFO  -   - overwrite_output_dir: False
04/02/2025 07:10:27 - INFO  -   - do_train: False
04/02/2025 07:10:27 - INFO  -   - do_eval: False
04/02/2025 07:10:27 - INFO  -   - do_predict: False
04/02/2025 07:10:27 - INFO  -   - eval_strategy: no
04/02/2025 07:10:27 - INFO  -   - prediction_loss_only: False
04/02/2025 07:10:27 - INFO  -   - per_device_train_batch_size: 1
04/02/2025 07:10:27 - INFO  -   - per_device_eval_batch_size: 8
04/02/2025 07:10:27 - INFO  -   - per_gpu_train_batch_size: None
04/02/2025 07:10:27 - INFO  -   - per_gpu_eval_batch_size: None
04/02/2025 07:10:27 - INFO  -   - gradient_accumulation_steps: 4
04/02/2025 07:10:27 - INFO  -   - eval_accumulation_steps: None
04/02/2025 07:10:27 - INFO  -   - eval_delay: 0
04/02/2025 07:10:27 - INFO  -   - torch_empty_cache_steps: None
04/02/2025 07:10:27 - INFO  -   - learning_rate: 0.0001
04/02/2025 07:10:27 - INFO  -   - weight_decay: 0.0
04/02/2025 07:10:27 - INFO  -   - adam_beta1: 0.9
04/02/2025 07:10:27 - INFO  -   - adam_beta2: 0.999
04/02/2025 07:10:27 - INFO  -   - adam_epsilon: 1e-08
04/02/2025 07:10:27 - INFO  -   - max_grad_norm: 0.3
04/02/2025 07:10:27 - INFO  -   - num_train_epochs: 3
04/02/2025 07:10:27 - INFO  -   - max_steps: -1
04/02/2025 07:10:27 - INFO  -   - lr_scheduler_type: constant
04/02/2025 07:10:27 - INFO  -   - lr_scheduler_kwargs: {}
04/02/2025 07:10:27 - INFO  -   - warmup_ratio: 0.03
04/02/2025 07:10:27 - INFO  -   - warmup_steps: 0
04/02/2025 07:10:27 - INFO  -   - log_level: passive
04/02/2025 07:10:27 - INFO  -   - log_level_replica: warning
04/02/2025 07:10:27 - INFO  -   - log_on_each_node: True
04/02/2025 07:10:27 - INFO  -   - logging_dir: output/gemma-3-27b-pt/20250402_070854_gemma-3-27b-pt_LoRA/runs/Apr02_07-10-27_llm-server-779876f58-9zzqd
04/02/2025 07:10:27 - INFO  -   - logging_strategy: steps
04/02/2025 07:10:27 - INFO  -   - logging_first_step: False
04/02/2025 07:10:27 - INFO  -   - logging_steps: 10
04/02/2025 07:10:27 - INFO  -   - logging_nan_inf_filter: True
04/02/2025 07:10:27 - INFO  -   - save_strategy: epoch
04/02/2025 07:10:27 - INFO  -   - save_steps: 500
04/02/2025 07:10:27 - INFO  -   - save_total_limit: None
04/02/2025 07:10:27 - INFO  -   - save_safetensors: True
04/02/2025 07:10:27 - INFO  -   - save_on_each_node: False
04/02/2025 07:10:27 - INFO  -   - save_only_model: False
04/02/2025 07:10:27 - INFO  -   - restore_callback_states_from_checkpoint: False
04/02/2025 07:10:27 - INFO  -   - no_cuda: False
04/02/2025 07:10:27 - INFO  -   - use_cpu: False
04/02/2025 07:10:27 - INFO  -   - use_mps_device: False
04/02/2025 07:10:27 - INFO  -   - seed: 42
04/02/2025 07:10:27 - INFO  -   - data_seed: None
04/02/2025 07:10:27 - INFO  -   - jit_mode_eval: False
04/02/2025 07:10:27 - INFO  -   - use_ipex: False
04/02/2025 07:10:27 - INFO  -   - bf16: True
04/02/2025 07:10:27 - INFO  -   - fp16: False
04/02/2025 07:10:27 - INFO  -   - fp16_opt_level: O1
04/02/2025 07:10:27 - INFO  -   - half_precision_backend: auto
04/02/2025 07:10:27 - INFO  -   - bf16_full_eval: False
04/02/2025 07:10:27 - INFO  -   - fp16_full_eval: False
04/02/2025 07:10:27 - INFO  -   - tf32: None
04/02/2025 07:10:27 - INFO  -   - local_rank: 0
04/02/2025 07:10:27 - INFO  -   - ddp_backend: None
04/02/2025 07:10:27 - INFO  -   - tpu_num_cores: None
04/02/2025 07:10:27 - INFO  -   - tpu_metrics_debug: False
04/02/2025 07:10:27 - INFO  -   - debug: []
04/02/2025 07:10:27 - INFO  -   - dataloader_drop_last: False
04/02/2025 07:10:27 - INFO  -   - eval_steps: None
04/02/2025 07:10:27 - INFO  -   - dataloader_num_workers: 0
04/02/2025 07:10:27 - INFO  -   - dataloader_prefetch_factor: None
04/02/2025 07:10:27 - INFO  -   - past_index: -1
04/02/2025 07:10:27 - INFO  -   - run_name: output/gemma-3-27b-pt/20250402_070854_gemma-3-27b-pt_LoRA
04/02/2025 07:10:27 - INFO  -   - disable_tqdm: False
04/02/2025 07:10:27 - INFO  -   - remove_unused_columns: True
04/02/2025 07:10:27 - INFO  -   - label_names: ['labels']
04/02/2025 07:10:27 - INFO  -   - load_best_model_at_end: False
04/02/2025 07:10:27 - INFO  -   - metric_for_best_model: None
04/02/2025 07:10:27 - INFO  -   - greater_is_better: None
04/02/2025 07:10:27 - INFO  -   - ignore_data_skip: False
04/02/2025 07:10:27 - INFO  -   - fsdp: []
04/02/2025 07:10:27 - INFO  -   - fsdp_min_num_params: 0
04/02/2025 07:10:27 - INFO  -   - fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
04/02/2025 07:10:27 - INFO  -   - tp_size: 0
04/02/2025 07:10:27 - INFO  -   - fsdp_transformer_layer_cls_to_wrap: None
04/02/2025 07:10:27 - INFO  -   - accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
04/02/2025 07:10:27 - INFO  -   - deepspeed: None
04/02/2025 07:10:27 - INFO  -   - label_smoothing_factor: 0.0
04/02/2025 07:10:27 - INFO  -   - optim: adamw_torch_fused
04/02/2025 07:10:27 - INFO  -   - optim_args: None
04/02/2025 07:10:27 - INFO  -   - adafactor: False
04/02/2025 07:10:27 - INFO  -   - group_by_length: False
04/02/2025 07:10:27 - INFO  -   - length_column_name: length
04/02/2025 07:10:27 - INFO  -   - report_to: ['tensorboard']
04/02/2025 07:10:27 - INFO  -   - ddp_find_unused_parameters: None
04/02/2025 07:10:27 - INFO  -   - ddp_bucket_cap_mb: None
04/02/2025 07:10:27 - INFO  -   - ddp_broadcast_buffers: None
04/02/2025 07:10:27 - INFO  -   - dataloader_pin_memory: True
04/02/2025 07:10:27 - INFO  -   - dataloader_persistent_workers: False
04/02/2025 07:10:27 - INFO  -   - skip_memory_metrics: True
04/02/2025 07:10:27 - INFO  -   - use_legacy_prediction_loop: False
04/02/2025 07:10:27 - INFO  -   - push_to_hub: True
04/02/2025 07:10:27 - INFO  -   - resume_from_checkpoint: None
04/02/2025 07:10:27 - INFO  -   - hub_model_id: None
04/02/2025 07:10:27 - INFO  -   - hub_strategy: every_save
04/02/2025 07:10:27 - INFO  -   - hub_token: <HUB_TOKEN>
04/02/2025 07:10:27 - INFO  -   - hub_private_repo: None
04/02/2025 07:10:27 - INFO  -   - hub_always_push: False
04/02/2025 07:10:27 - INFO  -   - gradient_checkpointing: False
04/02/2025 07:10:27 - INFO  -   - gradient_checkpointing_kwargs: None
04/02/2025 07:10:27 - INFO  -   - include_inputs_for_metrics: False
04/02/2025 07:10:27 - INFO  -   - include_for_metrics: []
04/02/2025 07:10:27 - INFO  -   - eval_do_concat_batches: True
04/02/2025 07:10:27 - INFO  -   - fp16_backend: auto
04/02/2025 07:10:27 - INFO  -   - evaluation_strategy: None
04/02/2025 07:10:27 - INFO  -   - push_to_hub_model_id: None
04/02/2025 07:10:27 - INFO  -   - push_to_hub_organization: None
04/02/2025 07:10:27 - INFO  -   - push_to_hub_token: <PUSH_TO_HUB_TOKEN>
04/02/2025 07:10:27 - INFO  -   - mp_parameters: 
04/02/2025 07:10:27 - INFO  -   - auto_find_batch_size: False
04/02/2025 07:10:27 - INFO  -   - full_determinism: False
04/02/2025 07:10:27 - INFO  -   - torchdynamo: None
04/02/2025 07:10:27 - INFO  -   - ray_scope: last
04/02/2025 07:10:27 - INFO  -   - ddp_timeout: 1800
04/02/2025 07:10:27 - INFO  -   - torch_compile: False
04/02/2025 07:10:27 - INFO  -   - torch_compile_backend: None
04/02/2025 07:10:27 - INFO  -   - torch_compile_mode: None
04/02/2025 07:10:27 - INFO  -   - dispatch_batches: None
04/02/2025 07:10:27 - INFO  -   - split_batches: None
04/02/2025 07:10:27 - INFO  -   - include_tokens_per_second: False
04/02/2025 07:10:27 - INFO  -   - include_num_input_tokens_seen: False
04/02/2025 07:10:27 - INFO  -   - neftune_noise_alpha: None
04/02/2025 07:10:27 - INFO  -   - optim_target_modules: None
04/02/2025 07:10:27 - INFO  -   - batch_eval_metrics: False
04/02/2025 07:10:27 - INFO  -   - eval_on_start: False
04/02/2025 07:10:27 - INFO  -   - use_liger_kernel: False
04/02/2025 07:10:27 - INFO  -   - eval_use_gather_object: False
04/02/2025 07:10:27 - INFO  -   - average_tokens_across_devices: False
04/02/2025 07:10:27 - INFO  -   - model_init_kwargs: None
04/02/2025 07:10:27 - INFO  -   - dataset_text_field: text
04/02/2025 07:10:27 - INFO  -   - dataset_kwargs: {'add_special_tokens': False, 'append_concat_token': True}
04/02/2025 07:10:27 - INFO  -   - dataset_num_proc: None
04/02/2025 07:10:27 - INFO  -   - max_length: 512
04/02/2025 07:10:27 - INFO  -   - packing: True
04/02/2025 07:10:27 - INFO  -   - padding_free: False
04/02/2025 07:10:27 - INFO  -   - eval_packing: None
04/02/2025 07:10:27 - INFO  -   - dataset_batch_size: None
04/02/2025 07:10:27 - INFO  -   - num_of_sequences: None
04/02/2025 07:10:27 - INFO  -   - chars_per_token: <CHARS_PER_TOKEN>
04/02/2025 07:10:27 - INFO  -   - max_seq_length: 512
04/02/2025 07:10:27 - INFO  -   - use_liger: None
04/02/2025 07:10:32 - INFO  - gcc -pthread -B /root/pai/envs/llm-finetuning/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /root/pai/envs/llm-finetuning/include -fPIC -O2 -isystem /root/pai/envs/llm-finetuning/include -fPIC -c /tmp/tmp7kmj2aos/test.c -o /tmp/tmp7kmj2aos/test.o
04/02/2025 07:10:32 - INFO  - gcc -pthread -B /root/pai/envs/llm-finetuning/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /root/pai/envs/llm-finetuning/include -fPIC -O2 -isystem /root/pai/envs/llm-finetuning/include -fPIC -c /tmp/tmpsq2ww307/test.c -o /tmp/tmpsq2ww307/test.o
04/02/2025 07:10:34 - INFO  - Start Training !
04/02/2025 07:11:01 - INFO  - [Epoch 0.11] [Step 10] loss: 3.8091
04/02/2025 07:11:25 - INFO  - [Epoch 0.22] [Step 20] loss: 3.1515
04/02/2025 07:11:48 - INFO  - [Epoch 0.33] [Step 30] loss: 3.0086
04/02/2025 07:12:11 - INFO  - [Epoch 0.44] [Step 40] loss: 2.9523
04/02/2025 07:12:34 - INFO  - [Epoch 0.55] [Step 50] loss: 2.9285
04/02/2025 07:12:56 - INFO  - [Epoch 0.66] [Step 60] loss: 2.9137
04/02/2025 07:13:19 - INFO  - [Epoch 0.77] [Step 70] loss: 2.8934
04/02/2025 07:13:42 - INFO  - [Epoch 0.88] [Step 80] loss: 2.8740
04/02/2025 07:14:05 - INFO  - [Epoch 0.99] [Step 90] loss: 2.8733
04/02/2025 07:15:17 - INFO  - [Epoch 1.09] [Step 100] loss: 2.7949
04/02/2025 07:15:40 - INFO  - [Epoch 1.20] [Step 110] loss: 2.7914
04/02/2025 07:16:03 - INFO  - [Epoch 1.31] [Step 120] loss: 2.7842
04/02/2025 07:16:26 - INFO  - [Epoch 1.42] [Step 130] loss: 2.7768
04/02/2025 07:16:48 - INFO  - [Epoch 1.53] [Step 140] loss: 2.7753
04/02/2025 07:17:11 - INFO  - [Epoch 1.64] [Step 150] loss: 2.7787
04/02/2025 07:17:33 - INFO  - [Epoch 1.75] [Step 160] loss: 2.7740
04/02/2025 07:17:56 - INFO  - [Epoch 1.85] [Step 170] loss: 2.7716
04/02/2025 07:18:18 - INFO  - [Epoch 1.96] [Step 180] loss: 2.7539
04/02/2025 07:20:10 - INFO  - [Epoch 2.07] [Step 190] loss: 2.6976
04/02/2025 07:20:33 - INFO  - [Epoch 2.18] [Step 200] loss: 2.6525
04/02/2025 07:20:56 - INFO  - [Epoch 2.28] [Step 210] loss: 2.6456
04/02/2025 07:21:19 - INFO  - [Epoch 2.39] [Step 220] loss: 2.6509
04/02/2025 07:21:42 - INFO  - [Epoch 2.50] [Step 230] loss: 2.6692
04/02/2025 07:22:04 - INFO  - [Epoch 2.61] [Step 240] loss: 2.6591
04/02/2025 07:22:27 - INFO  - [Epoch 2.72] [Step 250] loss: 2.6635
04/02/2025 07:22:50 - INFO  - [Epoch 2.83] [Step 260] loss: 2.6684
04/02/2025 07:23:12 - INFO  - [Epoch 2.94] [Step 270] loss: 2.6692
04/02/2025 07:25:16 - INFO  - βœ… Training complete. Logging system usage...
04/02/2025 07:25:16 - INFO  - >> System Usage - CPU: 2.8%, RAM: 3.2%, SSD: 76.20GB / 1888.43GB
04/02/2025 07:25:16 - INFO  - >> GPU 0: 73.78 GB used
04/02/2025 07:25:16 - INFO  - >> GPU 1: 79.22 GB used
04/02/2025 07:25:16 - INFO  - >> GPU 2: 74.50 GB used
04/02/2025 07:25:16 - INFO  - >> GPU 3: 73.50 GB used
04/02/2025 07:25:16 - INFO  - >> GPU 4: 73.44 GB used
04/02/2025 07:25:16 - INFO  - >> GPU 5: 73.24 GB used
04/02/2025 07:25:16 - INFO  - >> Total GPU Memory Used: 447.68 GB
04/02/2025 07:25:16 - INFO  - >> Total GPU Power Consumption: 531.29 W
04/02/2025 07:27:22 - INFO  - βœ… Training completed in 0h 16m 48s