File size: 14,690 Bytes
f079942 7dfda57 72d96c2 56a00bf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 |
04/02/2025 07:09:10 - INFO - Train data file: finetuning_data_25_sentences.json
04/02/2025 07:09:10 - INFO - Output Directory: output/gemma-3-27b-pt/20250402_070854_gemma-3-27b-pt_LoRA
04/02/2025 07:09:10 - INFO - Experiment name: KETI_b1_s4_e3
04/02/2025 07:09:10 - INFO - torch_dtype: torch.bfloat16
04/02/2025 07:09:10 - INFO - π Start inference on base model: google/gemma-3-27b-it
04/02/2025 07:10:10 - INFO - β
base_modelκ³Ό tokenizer λ©λͺ¨λ¦¬μμ ν΄μ μλ£
04/02/2025 07:10:10 - INFO - Using 6 GPU(s): NVIDIA A100-SXM4-80GB
04/02/2025 07:10:11 - INFO - π’ Training samples: 37822
04/02/2025 07:10:11 - INFO - π Evaluation samples: 4203
04/02/2025 07:10:11 - INFO - π Steps per epoch: 1575
04/02/2025 07:10:11 - INFO - πͺ Total training steps: 4725
04/02/2025 07:10:11 - INFO - β
FFT or LoRA λͺ¨λλ‘ νμ΅ν©λλ€.
04/02/2025 07:10:24 - INFO - Initializing LORA model...
04/02/2025 07:10:27 - INFO - π LoRA Configuration:
04/02/2025 07:10:27 - INFO - - task_type: CAUSAL_LM
04/02/2025 07:10:27 - INFO - - peft_type: PeftType.LORA
04/02/2025 07:10:27 - INFO - - auto_mapping: None
04/02/2025 07:10:27 - INFO - - base_model_name_or_path: google/gemma-3-27b-pt
04/02/2025 07:10:27 - INFO - - revision: None
04/02/2025 07:10:27 - INFO - - inference_mode: False
04/02/2025 07:10:27 - INFO - - r: 32
04/02/2025 07:10:27 - INFO - - target_modules: {'q_proj', 'v_proj', 'k_proj', 'gate_proj', 'down_proj', 'up_proj', 'o_proj'}
04/02/2025 07:10:27 - INFO - - exclude_modules: None
04/02/2025 07:10:27 - INFO - - lora_alpha: 16
04/02/2025 07:10:27 - INFO - - lora_dropout: 0.05
04/02/2025 07:10:27 - INFO - - fan_in_fan_out: False
04/02/2025 07:10:27 - INFO - - bias: none
04/02/2025 07:10:27 - INFO - - use_rslora: False
04/02/2025 07:10:27 - INFO - - modules_to_save: None
04/02/2025 07:10:27 - INFO - - init_lora_weights: True
04/02/2025 07:10:27 - INFO - - layers_to_transform: None
04/02/2025 07:10:27 - INFO - - layers_pattern: None
04/02/2025 07:10:27 - INFO - - rank_pattern: {}
04/02/2025 07:10:27 - INFO - - alpha_pattern: {}
04/02/2025 07:10:27 - INFO - - megatron_config: None
04/02/2025 07:10:27 - INFO - - megatron_core: megatron.core
04/02/2025 07:10:27 - INFO - - trainable_token_indices: None
04/02/2025 07:10:27 - INFO - - loftq_config: {}
04/02/2025 07:10:27 - INFO - - eva_config: None
04/02/2025 07:10:27 - INFO - - corda_config: None
04/02/2025 07:10:27 - INFO - - use_dora: False
04/02/2025 07:10:27 - INFO - - layer_replication: None
04/02/2025 07:10:27 - INFO - - lora_bias: False
04/02/2025 07:10:27 - INFO - π§ Trainable params: 227033088 / 27236379392 (0.83%)
04/02/2025 07:10:27 - INFO - π SFT Configuration:
04/02/2025 07:10:27 - INFO - - output_dir: output/gemma-3-27b-pt/20250402_070854_gemma-3-27b-pt_LoRA
04/02/2025 07:10:27 - INFO - - overwrite_output_dir: False
04/02/2025 07:10:27 - INFO - - do_train: False
04/02/2025 07:10:27 - INFO - - do_eval: False
04/02/2025 07:10:27 - INFO - - do_predict: False
04/02/2025 07:10:27 - INFO - - eval_strategy: no
04/02/2025 07:10:27 - INFO - - prediction_loss_only: False
04/02/2025 07:10:27 - INFO - - per_device_train_batch_size: 1
04/02/2025 07:10:27 - INFO - - per_device_eval_batch_size: 8
04/02/2025 07:10:27 - INFO - - per_gpu_train_batch_size: None
04/02/2025 07:10:27 - INFO - - per_gpu_eval_batch_size: None
04/02/2025 07:10:27 - INFO - - gradient_accumulation_steps: 4
04/02/2025 07:10:27 - INFO - - eval_accumulation_steps: None
04/02/2025 07:10:27 - INFO - - eval_delay: 0
04/02/2025 07:10:27 - INFO - - torch_empty_cache_steps: None
04/02/2025 07:10:27 - INFO - - learning_rate: 0.0001
04/02/2025 07:10:27 - INFO - - weight_decay: 0.0
04/02/2025 07:10:27 - INFO - - adam_beta1: 0.9
04/02/2025 07:10:27 - INFO - - adam_beta2: 0.999
04/02/2025 07:10:27 - INFO - - adam_epsilon: 1e-08
04/02/2025 07:10:27 - INFO - - max_grad_norm: 0.3
04/02/2025 07:10:27 - INFO - - num_train_epochs: 3
04/02/2025 07:10:27 - INFO - - max_steps: -1
04/02/2025 07:10:27 - INFO - - lr_scheduler_type: constant
04/02/2025 07:10:27 - INFO - - lr_scheduler_kwargs: {}
04/02/2025 07:10:27 - INFO - - warmup_ratio: 0.03
04/02/2025 07:10:27 - INFO - - warmup_steps: 0
04/02/2025 07:10:27 - INFO - - log_level: passive
04/02/2025 07:10:27 - INFO - - log_level_replica: warning
04/02/2025 07:10:27 - INFO - - log_on_each_node: True
04/02/2025 07:10:27 - INFO - - logging_dir: output/gemma-3-27b-pt/20250402_070854_gemma-3-27b-pt_LoRA/runs/Apr02_07-10-27_llm-server-779876f58-9zzqd
04/02/2025 07:10:27 - INFO - - logging_strategy: steps
04/02/2025 07:10:27 - INFO - - logging_first_step: False
04/02/2025 07:10:27 - INFO - - logging_steps: 10
04/02/2025 07:10:27 - INFO - - logging_nan_inf_filter: True
04/02/2025 07:10:27 - INFO - - save_strategy: epoch
04/02/2025 07:10:27 - INFO - - save_steps: 500
04/02/2025 07:10:27 - INFO - - save_total_limit: None
04/02/2025 07:10:27 - INFO - - save_safetensors: True
04/02/2025 07:10:27 - INFO - - save_on_each_node: False
04/02/2025 07:10:27 - INFO - - save_only_model: False
04/02/2025 07:10:27 - INFO - - restore_callback_states_from_checkpoint: False
04/02/2025 07:10:27 - INFO - - no_cuda: False
04/02/2025 07:10:27 - INFO - - use_cpu: False
04/02/2025 07:10:27 - INFO - - use_mps_device: False
04/02/2025 07:10:27 - INFO - - seed: 42
04/02/2025 07:10:27 - INFO - - data_seed: None
04/02/2025 07:10:27 - INFO - - jit_mode_eval: False
04/02/2025 07:10:27 - INFO - - use_ipex: False
04/02/2025 07:10:27 - INFO - - bf16: True
04/02/2025 07:10:27 - INFO - - fp16: False
04/02/2025 07:10:27 - INFO - - fp16_opt_level: O1
04/02/2025 07:10:27 - INFO - - half_precision_backend: auto
04/02/2025 07:10:27 - INFO - - bf16_full_eval: False
04/02/2025 07:10:27 - INFO - - fp16_full_eval: False
04/02/2025 07:10:27 - INFO - - tf32: None
04/02/2025 07:10:27 - INFO - - local_rank: 0
04/02/2025 07:10:27 - INFO - - ddp_backend: None
04/02/2025 07:10:27 - INFO - - tpu_num_cores: None
04/02/2025 07:10:27 - INFO - - tpu_metrics_debug: False
04/02/2025 07:10:27 - INFO - - debug: []
04/02/2025 07:10:27 - INFO - - dataloader_drop_last: False
04/02/2025 07:10:27 - INFO - - eval_steps: None
04/02/2025 07:10:27 - INFO - - dataloader_num_workers: 0
04/02/2025 07:10:27 - INFO - - dataloader_prefetch_factor: None
04/02/2025 07:10:27 - INFO - - past_index: -1
04/02/2025 07:10:27 - INFO - - run_name: output/gemma-3-27b-pt/20250402_070854_gemma-3-27b-pt_LoRA
04/02/2025 07:10:27 - INFO - - disable_tqdm: False
04/02/2025 07:10:27 - INFO - - remove_unused_columns: True
04/02/2025 07:10:27 - INFO - - label_names: ['labels']
04/02/2025 07:10:27 - INFO - - load_best_model_at_end: False
04/02/2025 07:10:27 - INFO - - metric_for_best_model: None
04/02/2025 07:10:27 - INFO - - greater_is_better: None
04/02/2025 07:10:27 - INFO - - ignore_data_skip: False
04/02/2025 07:10:27 - INFO - - fsdp: []
04/02/2025 07:10:27 - INFO - - fsdp_min_num_params: 0
04/02/2025 07:10:27 - INFO - - fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
04/02/2025 07:10:27 - INFO - - tp_size: 0
04/02/2025 07:10:27 - INFO - - fsdp_transformer_layer_cls_to_wrap: None
04/02/2025 07:10:27 - INFO - - accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
04/02/2025 07:10:27 - INFO - - deepspeed: None
04/02/2025 07:10:27 - INFO - - label_smoothing_factor: 0.0
04/02/2025 07:10:27 - INFO - - optim: adamw_torch_fused
04/02/2025 07:10:27 - INFO - - optim_args: None
04/02/2025 07:10:27 - INFO - - adafactor: False
04/02/2025 07:10:27 - INFO - - group_by_length: False
04/02/2025 07:10:27 - INFO - - length_column_name: length
04/02/2025 07:10:27 - INFO - - report_to: ['tensorboard']
04/02/2025 07:10:27 - INFO - - ddp_find_unused_parameters: None
04/02/2025 07:10:27 - INFO - - ddp_bucket_cap_mb: None
04/02/2025 07:10:27 - INFO - - ddp_broadcast_buffers: None
04/02/2025 07:10:27 - INFO - - dataloader_pin_memory: True
04/02/2025 07:10:27 - INFO - - dataloader_persistent_workers: False
04/02/2025 07:10:27 - INFO - - skip_memory_metrics: True
04/02/2025 07:10:27 - INFO - - use_legacy_prediction_loop: False
04/02/2025 07:10:27 - INFO - - push_to_hub: True
04/02/2025 07:10:27 - INFO - - resume_from_checkpoint: None
04/02/2025 07:10:27 - INFO - - hub_model_id: None
04/02/2025 07:10:27 - INFO - - hub_strategy: every_save
04/02/2025 07:10:27 - INFO - - hub_token: <HUB_TOKEN>
04/02/2025 07:10:27 - INFO - - hub_private_repo: None
04/02/2025 07:10:27 - INFO - - hub_always_push: False
04/02/2025 07:10:27 - INFO - - gradient_checkpointing: False
04/02/2025 07:10:27 - INFO - - gradient_checkpointing_kwargs: None
04/02/2025 07:10:27 - INFO - - include_inputs_for_metrics: False
04/02/2025 07:10:27 - INFO - - include_for_metrics: []
04/02/2025 07:10:27 - INFO - - eval_do_concat_batches: True
04/02/2025 07:10:27 - INFO - - fp16_backend: auto
04/02/2025 07:10:27 - INFO - - evaluation_strategy: None
04/02/2025 07:10:27 - INFO - - push_to_hub_model_id: None
04/02/2025 07:10:27 - INFO - - push_to_hub_organization: None
04/02/2025 07:10:27 - INFO - - push_to_hub_token: <PUSH_TO_HUB_TOKEN>
04/02/2025 07:10:27 - INFO - - mp_parameters:
04/02/2025 07:10:27 - INFO - - auto_find_batch_size: False
04/02/2025 07:10:27 - INFO - - full_determinism: False
04/02/2025 07:10:27 - INFO - - torchdynamo: None
04/02/2025 07:10:27 - INFO - - ray_scope: last
04/02/2025 07:10:27 - INFO - - ddp_timeout: 1800
04/02/2025 07:10:27 - INFO - - torch_compile: False
04/02/2025 07:10:27 - INFO - - torch_compile_backend: None
04/02/2025 07:10:27 - INFO - - torch_compile_mode: None
04/02/2025 07:10:27 - INFO - - dispatch_batches: None
04/02/2025 07:10:27 - INFO - - split_batches: None
04/02/2025 07:10:27 - INFO - - include_tokens_per_second: False
04/02/2025 07:10:27 - INFO - - include_num_input_tokens_seen: False
04/02/2025 07:10:27 - INFO - - neftune_noise_alpha: None
04/02/2025 07:10:27 - INFO - - optim_target_modules: None
04/02/2025 07:10:27 - INFO - - batch_eval_metrics: False
04/02/2025 07:10:27 - INFO - - eval_on_start: False
04/02/2025 07:10:27 - INFO - - use_liger_kernel: False
04/02/2025 07:10:27 - INFO - - eval_use_gather_object: False
04/02/2025 07:10:27 - INFO - - average_tokens_across_devices: False
04/02/2025 07:10:27 - INFO - - model_init_kwargs: None
04/02/2025 07:10:27 - INFO - - dataset_text_field: text
04/02/2025 07:10:27 - INFO - - dataset_kwargs: {'add_special_tokens': False, 'append_concat_token': True}
04/02/2025 07:10:27 - INFO - - dataset_num_proc: None
04/02/2025 07:10:27 - INFO - - max_length: 512
04/02/2025 07:10:27 - INFO - - packing: True
04/02/2025 07:10:27 - INFO - - padding_free: False
04/02/2025 07:10:27 - INFO - - eval_packing: None
04/02/2025 07:10:27 - INFO - - dataset_batch_size: None
04/02/2025 07:10:27 - INFO - - num_of_sequences: None
04/02/2025 07:10:27 - INFO - - chars_per_token: <CHARS_PER_TOKEN>
04/02/2025 07:10:27 - INFO - - max_seq_length: 512
04/02/2025 07:10:27 - INFO - - use_liger: None
04/02/2025 07:10:32 - INFO - gcc -pthread -B /root/pai/envs/llm-finetuning/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /root/pai/envs/llm-finetuning/include -fPIC -O2 -isystem /root/pai/envs/llm-finetuning/include -fPIC -c /tmp/tmp7kmj2aos/test.c -o /tmp/tmp7kmj2aos/test.o
04/02/2025 07:10:32 - INFO - gcc -pthread -B /root/pai/envs/llm-finetuning/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /root/pai/envs/llm-finetuning/include -fPIC -O2 -isystem /root/pai/envs/llm-finetuning/include -fPIC -c /tmp/tmpsq2ww307/test.c -o /tmp/tmpsq2ww307/test.o
04/02/2025 07:10:34 - INFO - Start Training !
04/02/2025 07:11:01 - INFO - [Epoch 0.11] [Step 10] loss: 3.8091
04/02/2025 07:11:25 - INFO - [Epoch 0.22] [Step 20] loss: 3.1515
04/02/2025 07:11:48 - INFO - [Epoch 0.33] [Step 30] loss: 3.0086
04/02/2025 07:12:11 - INFO - [Epoch 0.44] [Step 40] loss: 2.9523
04/02/2025 07:12:34 - INFO - [Epoch 0.55] [Step 50] loss: 2.9285
04/02/2025 07:12:56 - INFO - [Epoch 0.66] [Step 60] loss: 2.9137
04/02/2025 07:13:19 - INFO - [Epoch 0.77] [Step 70] loss: 2.8934
04/02/2025 07:13:42 - INFO - [Epoch 0.88] [Step 80] loss: 2.8740
04/02/2025 07:14:05 - INFO - [Epoch 0.99] [Step 90] loss: 2.8733
04/02/2025 07:15:17 - INFO - [Epoch 1.09] [Step 100] loss: 2.7949
04/02/2025 07:15:40 - INFO - [Epoch 1.20] [Step 110] loss: 2.7914
04/02/2025 07:16:03 - INFO - [Epoch 1.31] [Step 120] loss: 2.7842
04/02/2025 07:16:26 - INFO - [Epoch 1.42] [Step 130] loss: 2.7768
04/02/2025 07:16:48 - INFO - [Epoch 1.53] [Step 140] loss: 2.7753
04/02/2025 07:17:11 - INFO - [Epoch 1.64] [Step 150] loss: 2.7787
04/02/2025 07:17:33 - INFO - [Epoch 1.75] [Step 160] loss: 2.7740
04/02/2025 07:17:56 - INFO - [Epoch 1.85] [Step 170] loss: 2.7716
04/02/2025 07:18:18 - INFO - [Epoch 1.96] [Step 180] loss: 2.7539
04/02/2025 07:20:10 - INFO - [Epoch 2.07] [Step 190] loss: 2.6976
04/02/2025 07:20:33 - INFO - [Epoch 2.18] [Step 200] loss: 2.6525
04/02/2025 07:20:56 - INFO - [Epoch 2.28] [Step 210] loss: 2.6456
04/02/2025 07:21:19 - INFO - [Epoch 2.39] [Step 220] loss: 2.6509
04/02/2025 07:21:42 - INFO - [Epoch 2.50] [Step 230] loss: 2.6692
04/02/2025 07:22:04 - INFO - [Epoch 2.61] [Step 240] loss: 2.6591
04/02/2025 07:22:27 - INFO - [Epoch 2.72] [Step 250] loss: 2.6635
04/02/2025 07:22:50 - INFO - [Epoch 2.83] [Step 260] loss: 2.6684
04/02/2025 07:23:12 - INFO - [Epoch 2.94] [Step 270] loss: 2.6692
04/02/2025 07:25:16 - INFO - β
Training complete. Logging system usage...
04/02/2025 07:25:16 - INFO - >> System Usage - CPU: 2.8%, RAM: 3.2%, SSD: 76.20GB / 1888.43GB
04/02/2025 07:25:16 - INFO - >> GPU 0: 73.78 GB used
04/02/2025 07:25:16 - INFO - >> GPU 1: 79.22 GB used
04/02/2025 07:25:16 - INFO - >> GPU 2: 74.50 GB used
04/02/2025 07:25:16 - INFO - >> GPU 3: 73.50 GB used
04/02/2025 07:25:16 - INFO - >> GPU 4: 73.44 GB used
04/02/2025 07:25:16 - INFO - >> GPU 5: 73.24 GB used
04/02/2025 07:25:16 - INFO - >> Total GPU Memory Used: 447.68 GB
04/02/2025 07:25:16 - INFO - >> Total GPU Power Consumption: 531.29 W
04/02/2025 07:27:22 - INFO - β
Training completed in 0h 16m 48s
|