|
|
04/02/2025 07:09:10 - INFO - Train data file: finetuning_data_25_sentences.json |
|
|
04/02/2025 07:09:10 - INFO - Output Directory: output/gemma-3-27b-pt/20250402_070854_gemma-3-27b-pt_LoRA |
|
|
04/02/2025 07:09:10 - INFO - Experiment name: KETI_b1_s4_e3 |
|
|
04/02/2025 07:09:10 - INFO - torch_dtype: torch.bfloat16 |
|
|
04/02/2025 07:09:10 - INFO - π Start inference on base model: google/gemma-3-27b-it |
|
|
04/02/2025 07:10:10 - INFO - β
base_modelκ³Ό tokenizer λ©λͺ¨λ¦¬μμ ν΄μ μλ£ |
|
|
04/02/2025 07:10:10 - INFO - Using 6 GPU(s): NVIDIA A100-SXM4-80GB |
|
|
04/02/2025 07:10:11 - INFO - π’ Training samples: 37822 |
|
|
04/02/2025 07:10:11 - INFO - π Evaluation samples: 4203 |
|
|
04/02/2025 07:10:11 - INFO - π Steps per epoch: 1575 |
|
|
04/02/2025 07:10:11 - INFO - πͺ Total training steps: 4725 |
|
|
04/02/2025 07:10:11 - INFO - β
FFT or LoRA λͺ¨λλ‘ νμ΅ν©λλ€. |
|
|
04/02/2025 07:10:24 - INFO - Initializing LORA model... |
|
|
04/02/2025 07:10:27 - INFO - π LoRA Configuration: |
|
|
04/02/2025 07:10:27 - INFO - - task_type: CAUSAL_LM |
|
|
04/02/2025 07:10:27 - INFO - - peft_type: PeftType.LORA |
|
|
04/02/2025 07:10:27 - INFO - - auto_mapping: None |
|
|
04/02/2025 07:10:27 - INFO - - base_model_name_or_path: google/gemma-3-27b-pt |
|
|
04/02/2025 07:10:27 - INFO - - revision: None |
|
|
04/02/2025 07:10:27 - INFO - - inference_mode: False |
|
|
04/02/2025 07:10:27 - INFO - - r: 32 |
|
|
04/02/2025 07:10:27 - INFO - - target_modules: {'q_proj', 'v_proj', 'k_proj', 'gate_proj', 'down_proj', 'up_proj', 'o_proj'} |
|
|
04/02/2025 07:10:27 - INFO - - exclude_modules: None |
|
|
04/02/2025 07:10:27 - INFO - - lora_alpha: 16 |
|
|
04/02/2025 07:10:27 - INFO - - lora_dropout: 0.05 |
|
|
04/02/2025 07:10:27 - INFO - - fan_in_fan_out: False |
|
|
04/02/2025 07:10:27 - INFO - - bias: none |
|
|
04/02/2025 07:10:27 - INFO - - use_rslora: False |
|
|
04/02/2025 07:10:27 - INFO - - modules_to_save: None |
|
|
04/02/2025 07:10:27 - INFO - - init_lora_weights: True |
|
|
04/02/2025 07:10:27 - INFO - - layers_to_transform: None |
|
|
04/02/2025 07:10:27 - INFO - - layers_pattern: None |
|
|
04/02/2025 07:10:27 - INFO - - rank_pattern: {} |
|
|
04/02/2025 07:10:27 - INFO - - alpha_pattern: {} |
|
|
04/02/2025 07:10:27 - INFO - - megatron_config: None |
|
|
04/02/2025 07:10:27 - INFO - - megatron_core: megatron.core |
|
|
04/02/2025 07:10:27 - INFO - - trainable_token_indices: None |
|
|
04/02/2025 07:10:27 - INFO - - loftq_config: {} |
|
|
04/02/2025 07:10:27 - INFO - - eva_config: None |
|
|
04/02/2025 07:10:27 - INFO - - corda_config: None |
|
|
04/02/2025 07:10:27 - INFO - - use_dora: False |
|
|
04/02/2025 07:10:27 - INFO - - layer_replication: None |
|
|
04/02/2025 07:10:27 - INFO - - lora_bias: False |
|
|
04/02/2025 07:10:27 - INFO - π§ Trainable params: 227033088 / 27236379392 (0.83%) |
|
|
04/02/2025 07:10:27 - INFO - π SFT Configuration: |
|
|
04/02/2025 07:10:27 - INFO - - output_dir: output/gemma-3-27b-pt/20250402_070854_gemma-3-27b-pt_LoRA |
|
|
04/02/2025 07:10:27 - INFO - - overwrite_output_dir: False |
|
|
04/02/2025 07:10:27 - INFO - - do_train: False |
|
|
04/02/2025 07:10:27 - INFO - - do_eval: False |
|
|
04/02/2025 07:10:27 - INFO - - do_predict: False |
|
|
04/02/2025 07:10:27 - INFO - - eval_strategy: no |
|
|
04/02/2025 07:10:27 - INFO - - prediction_loss_only: False |
|
|
04/02/2025 07:10:27 - INFO - - per_device_train_batch_size: 1 |
|
|
04/02/2025 07:10:27 - INFO - - per_device_eval_batch_size: 8 |
|
|
04/02/2025 07:10:27 - INFO - - per_gpu_train_batch_size: None |
|
|
04/02/2025 07:10:27 - INFO - - per_gpu_eval_batch_size: None |
|
|
04/02/2025 07:10:27 - INFO - - gradient_accumulation_steps: 4 |
|
|
04/02/2025 07:10:27 - INFO - - eval_accumulation_steps: None |
|
|
04/02/2025 07:10:27 - INFO - - eval_delay: 0 |
|
|
04/02/2025 07:10:27 - INFO - - torch_empty_cache_steps: None |
|
|
04/02/2025 07:10:27 - INFO - - learning_rate: 0.0001 |
|
|
04/02/2025 07:10:27 - INFO - - weight_decay: 0.0 |
|
|
04/02/2025 07:10:27 - INFO - - adam_beta1: 0.9 |
|
|
04/02/2025 07:10:27 - INFO - - adam_beta2: 0.999 |
|
|
04/02/2025 07:10:27 - INFO - - adam_epsilon: 1e-08 |
|
|
04/02/2025 07:10:27 - INFO - - max_grad_norm: 0.3 |
|
|
04/02/2025 07:10:27 - INFO - - num_train_epochs: 3 |
|
|
04/02/2025 07:10:27 - INFO - - max_steps: -1 |
|
|
04/02/2025 07:10:27 - INFO - - lr_scheduler_type: constant |
|
|
04/02/2025 07:10:27 - INFO - - lr_scheduler_kwargs: {} |
|
|
04/02/2025 07:10:27 - INFO - - warmup_ratio: 0.03 |
|
|
04/02/2025 07:10:27 - INFO - - warmup_steps: 0 |
|
|
04/02/2025 07:10:27 - INFO - - log_level: passive |
|
|
04/02/2025 07:10:27 - INFO - - log_level_replica: warning |
|
|
04/02/2025 07:10:27 - INFO - - log_on_each_node: True |
|
|
04/02/2025 07:10:27 - INFO - - logging_dir: output/gemma-3-27b-pt/20250402_070854_gemma-3-27b-pt_LoRA/runs/Apr02_07-10-27_llm-server-779876f58-9zzqd |
|
|
04/02/2025 07:10:27 - INFO - - logging_strategy: steps |
|
|
04/02/2025 07:10:27 - INFO - - logging_first_step: False |
|
|
04/02/2025 07:10:27 - INFO - - logging_steps: 10 |
|
|
04/02/2025 07:10:27 - INFO - - logging_nan_inf_filter: True |
|
|
04/02/2025 07:10:27 - INFO - - save_strategy: epoch |
|
|
04/02/2025 07:10:27 - INFO - - save_steps: 500 |
|
|
04/02/2025 07:10:27 - INFO - - save_total_limit: None |
|
|
04/02/2025 07:10:27 - INFO - - save_safetensors: True |
|
|
04/02/2025 07:10:27 - INFO - - save_on_each_node: False |
|
|
04/02/2025 07:10:27 - INFO - - save_only_model: False |
|
|
04/02/2025 07:10:27 - INFO - - restore_callback_states_from_checkpoint: False |
|
|
04/02/2025 07:10:27 - INFO - - no_cuda: False |
|
|
04/02/2025 07:10:27 - INFO - - use_cpu: False |
|
|
04/02/2025 07:10:27 - INFO - - use_mps_device: False |
|
|
04/02/2025 07:10:27 - INFO - - seed: 42 |
|
|
04/02/2025 07:10:27 - INFO - - data_seed: None |
|
|
04/02/2025 07:10:27 - INFO - - jit_mode_eval: False |
|
|
04/02/2025 07:10:27 - INFO - - use_ipex: False |
|
|
04/02/2025 07:10:27 - INFO - - bf16: True |
|
|
04/02/2025 07:10:27 - INFO - - fp16: False |
|
|
04/02/2025 07:10:27 - INFO - - fp16_opt_level: O1 |
|
|
04/02/2025 07:10:27 - INFO - - half_precision_backend: auto |
|
|
04/02/2025 07:10:27 - INFO - - bf16_full_eval: False |
|
|
04/02/2025 07:10:27 - INFO - - fp16_full_eval: False |
|
|
04/02/2025 07:10:27 - INFO - - tf32: None |
|
|
04/02/2025 07:10:27 - INFO - - local_rank: 0 |
|
|
04/02/2025 07:10:27 - INFO - - ddp_backend: None |
|
|
04/02/2025 07:10:27 - INFO - - tpu_num_cores: None |
|
|
04/02/2025 07:10:27 - INFO - - tpu_metrics_debug: False |
|
|
04/02/2025 07:10:27 - INFO - - debug: [] |
|
|
04/02/2025 07:10:27 - INFO - - dataloader_drop_last: False |
|
|
04/02/2025 07:10:27 - INFO - - eval_steps: None |
|
|
04/02/2025 07:10:27 - INFO - - dataloader_num_workers: 0 |
|
|
04/02/2025 07:10:27 - INFO - - dataloader_prefetch_factor: None |
|
|
04/02/2025 07:10:27 - INFO - - past_index: -1 |
|
|
04/02/2025 07:10:27 - INFO - - run_name: output/gemma-3-27b-pt/20250402_070854_gemma-3-27b-pt_LoRA |
|
|
04/02/2025 07:10:27 - INFO - - disable_tqdm: False |
|
|
04/02/2025 07:10:27 - INFO - - remove_unused_columns: True |
|
|
04/02/2025 07:10:27 - INFO - - label_names: ['labels'] |
|
|
04/02/2025 07:10:27 - INFO - - load_best_model_at_end: False |
|
|
04/02/2025 07:10:27 - INFO - - metric_for_best_model: None |
|
|
04/02/2025 07:10:27 - INFO - - greater_is_better: None |
|
|
04/02/2025 07:10:27 - INFO - - ignore_data_skip: False |
|
|
04/02/2025 07:10:27 - INFO - - fsdp: [] |
|
|
04/02/2025 07:10:27 - INFO - - fsdp_min_num_params: 0 |
|
|
04/02/2025 07:10:27 - INFO - - fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
|
|
04/02/2025 07:10:27 - INFO - - tp_size: 0 |
|
|
04/02/2025 07:10:27 - INFO - - fsdp_transformer_layer_cls_to_wrap: None |
|
|
04/02/2025 07:10:27 - INFO - - accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} |
|
|
04/02/2025 07:10:27 - INFO - - deepspeed: None |
|
|
04/02/2025 07:10:27 - INFO - - label_smoothing_factor: 0.0 |
|
|
04/02/2025 07:10:27 - INFO - - optim: adamw_torch_fused |
|
|
04/02/2025 07:10:27 - INFO - - optim_args: None |
|
|
04/02/2025 07:10:27 - INFO - - adafactor: False |
|
|
04/02/2025 07:10:27 - INFO - - group_by_length: False |
|
|
04/02/2025 07:10:27 - INFO - - length_column_name: length |
|
|
04/02/2025 07:10:27 - INFO - - report_to: ['tensorboard'] |
|
|
04/02/2025 07:10:27 - INFO - - ddp_find_unused_parameters: None |
|
|
04/02/2025 07:10:27 - INFO - - ddp_bucket_cap_mb: None |
|
|
04/02/2025 07:10:27 - INFO - - ddp_broadcast_buffers: None |
|
|
04/02/2025 07:10:27 - INFO - - dataloader_pin_memory: True |
|
|
04/02/2025 07:10:27 - INFO - - dataloader_persistent_workers: False |
|
|
04/02/2025 07:10:27 - INFO - - skip_memory_metrics: True |
|
|
04/02/2025 07:10:27 - INFO - - use_legacy_prediction_loop: False |
|
|
04/02/2025 07:10:27 - INFO - - push_to_hub: True |
|
|
04/02/2025 07:10:27 - INFO - - resume_from_checkpoint: None |
|
|
04/02/2025 07:10:27 - INFO - - hub_model_id: None |
|
|
04/02/2025 07:10:27 - INFO - - hub_strategy: every_save |
|
|
04/02/2025 07:10:27 - INFO - - hub_token: <HUB_TOKEN> |
|
|
04/02/2025 07:10:27 - INFO - - hub_private_repo: None |
|
|
04/02/2025 07:10:27 - INFO - - hub_always_push: False |
|
|
04/02/2025 07:10:27 - INFO - - gradient_checkpointing: False |
|
|
04/02/2025 07:10:27 - INFO - - gradient_checkpointing_kwargs: None |
|
|
04/02/2025 07:10:27 - INFO - - include_inputs_for_metrics: False |
|
|
04/02/2025 07:10:27 - INFO - - include_for_metrics: [] |
|
|
04/02/2025 07:10:27 - INFO - - eval_do_concat_batches: True |
|
|
04/02/2025 07:10:27 - INFO - - fp16_backend: auto |
|
|
04/02/2025 07:10:27 - INFO - - evaluation_strategy: None |
|
|
04/02/2025 07:10:27 - INFO - - push_to_hub_model_id: None |
|
|
04/02/2025 07:10:27 - INFO - - push_to_hub_organization: None |
|
|
04/02/2025 07:10:27 - INFO - - push_to_hub_token: <PUSH_TO_HUB_TOKEN> |
|
|
04/02/2025 07:10:27 - INFO - - mp_parameters: |
|
|
04/02/2025 07:10:27 - INFO - - auto_find_batch_size: False |
|
|
04/02/2025 07:10:27 - INFO - - full_determinism: False |
|
|
04/02/2025 07:10:27 - INFO - - torchdynamo: None |
|
|
04/02/2025 07:10:27 - INFO - - ray_scope: last |
|
|
04/02/2025 07:10:27 - INFO - - ddp_timeout: 1800 |
|
|
04/02/2025 07:10:27 - INFO - - torch_compile: False |
|
|
04/02/2025 07:10:27 - INFO - - torch_compile_backend: None |
|
|
04/02/2025 07:10:27 - INFO - - torch_compile_mode: None |
|
|
04/02/2025 07:10:27 - INFO - - dispatch_batches: None |
|
|
04/02/2025 07:10:27 - INFO - - split_batches: None |
|
|
04/02/2025 07:10:27 - INFO - - include_tokens_per_second: False |
|
|
04/02/2025 07:10:27 - INFO - - include_num_input_tokens_seen: False |
|
|
04/02/2025 07:10:27 - INFO - - neftune_noise_alpha: None |
|
|
04/02/2025 07:10:27 - INFO - - optim_target_modules: None |
|
|
04/02/2025 07:10:27 - INFO - - batch_eval_metrics: False |
|
|
04/02/2025 07:10:27 - INFO - - eval_on_start: False |
|
|
04/02/2025 07:10:27 - INFO - - use_liger_kernel: False |
|
|
04/02/2025 07:10:27 - INFO - - eval_use_gather_object: False |
|
|
04/02/2025 07:10:27 - INFO - - average_tokens_across_devices: False |
|
|
04/02/2025 07:10:27 - INFO - - model_init_kwargs: None |
|
|
04/02/2025 07:10:27 - INFO - - dataset_text_field: text |
|
|
04/02/2025 07:10:27 - INFO - - dataset_kwargs: {'add_special_tokens': False, 'append_concat_token': True} |
|
|
04/02/2025 07:10:27 - INFO - - dataset_num_proc: None |
|
|
04/02/2025 07:10:27 - INFO - - max_length: 512 |
|
|
04/02/2025 07:10:27 - INFO - - packing: True |
|
|
04/02/2025 07:10:27 - INFO - - padding_free: False |
|
|
04/02/2025 07:10:27 - INFO - - eval_packing: None |
|
|
04/02/2025 07:10:27 - INFO - - dataset_batch_size: None |
|
|
04/02/2025 07:10:27 - INFO - - num_of_sequences: None |
|
|
04/02/2025 07:10:27 - INFO - - chars_per_token: <CHARS_PER_TOKEN> |
|
|
04/02/2025 07:10:27 - INFO - - max_seq_length: 512 |
|
|
04/02/2025 07:10:27 - INFO - - use_liger: None |
|
|
04/02/2025 07:10:32 - INFO - gcc -pthread -B /root/pai/envs/llm-finetuning/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /root/pai/envs/llm-finetuning/include -fPIC -O2 -isystem /root/pai/envs/llm-finetuning/include -fPIC -c /tmp/tmp7kmj2aos/test.c -o /tmp/tmp7kmj2aos/test.o |
|
|
04/02/2025 07:10:32 - INFO - gcc -pthread -B /root/pai/envs/llm-finetuning/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /root/pai/envs/llm-finetuning/include -fPIC -O2 -isystem /root/pai/envs/llm-finetuning/include -fPIC -c /tmp/tmpsq2ww307/test.c -o /tmp/tmpsq2ww307/test.o |
|
|
04/02/2025 07:10:34 - INFO - Start Training ! |
|
|
04/02/2025 07:11:01 - INFO - [Epoch 0.11] [Step 10] loss: 3.8091 |
|
|
04/02/2025 07:11:25 - INFO - [Epoch 0.22] [Step 20] loss: 3.1515 |
|
|
04/02/2025 07:11:48 - INFO - [Epoch 0.33] [Step 30] loss: 3.0086 |
|
|
04/02/2025 07:12:11 - INFO - [Epoch 0.44] [Step 40] loss: 2.9523 |
|
|
04/02/2025 07:12:34 - INFO - [Epoch 0.55] [Step 50] loss: 2.9285 |
|
|
04/02/2025 07:12:56 - INFO - [Epoch 0.66] [Step 60] loss: 2.9137 |
|
|
04/02/2025 07:13:19 - INFO - [Epoch 0.77] [Step 70] loss: 2.8934 |
|
|
04/02/2025 07:13:42 - INFO - [Epoch 0.88] [Step 80] loss: 2.8740 |
|
|
04/02/2025 07:14:05 - INFO - [Epoch 0.99] [Step 90] loss: 2.8733 |
|
|
04/02/2025 07:15:17 - INFO - [Epoch 1.09] [Step 100] loss: 2.7949 |
|
|
04/02/2025 07:15:40 - INFO - [Epoch 1.20] [Step 110] loss: 2.7914 |
|
|
04/02/2025 07:16:03 - INFO - [Epoch 1.31] [Step 120] loss: 2.7842 |
|
|
04/02/2025 07:16:26 - INFO - [Epoch 1.42] [Step 130] loss: 2.7768 |
|
|
04/02/2025 07:16:48 - INFO - [Epoch 1.53] [Step 140] loss: 2.7753 |
|
|
04/02/2025 07:17:11 - INFO - [Epoch 1.64] [Step 150] loss: 2.7787 |
|
|
04/02/2025 07:17:33 - INFO - [Epoch 1.75] [Step 160] loss: 2.7740 |
|
|
04/02/2025 07:17:56 - INFO - [Epoch 1.85] [Step 170] loss: 2.7716 |
|
|
04/02/2025 07:18:18 - INFO - [Epoch 1.96] [Step 180] loss: 2.7539 |
|
|
04/02/2025 07:20:10 - INFO - [Epoch 2.07] [Step 190] loss: 2.6976 |
|
|
04/02/2025 07:20:33 - INFO - [Epoch 2.18] [Step 200] loss: 2.6525 |
|
|
04/02/2025 07:20:56 - INFO - [Epoch 2.28] [Step 210] loss: 2.6456 |
|
|
04/02/2025 07:21:19 - INFO - [Epoch 2.39] [Step 220] loss: 2.6509 |
|
|
04/02/2025 07:21:42 - INFO - [Epoch 2.50] [Step 230] loss: 2.6692 |
|
|
04/02/2025 07:22:04 - INFO - [Epoch 2.61] [Step 240] loss: 2.6591 |
|
|
04/02/2025 07:22:27 - INFO - [Epoch 2.72] [Step 250] loss: 2.6635 |
|
|
04/02/2025 07:22:50 - INFO - [Epoch 2.83] [Step 260] loss: 2.6684 |
|
|
04/02/2025 07:23:12 - INFO - [Epoch 2.94] [Step 270] loss: 2.6692 |
|
|
04/02/2025 07:25:16 - INFO - β
Training complete. Logging system usage... |
|
|
04/02/2025 07:25:16 - INFO - >> System Usage - CPU: 2.8%, RAM: 3.2%, SSD: 76.20GB / 1888.43GB |
|
|
04/02/2025 07:25:16 - INFO - >> GPU 0: 73.78 GB used |
|
|
04/02/2025 07:25:16 - INFO - >> GPU 1: 79.22 GB used |
|
|
04/02/2025 07:25:16 - INFO - >> GPU 2: 74.50 GB used |
|
|
04/02/2025 07:25:16 - INFO - >> GPU 3: 73.50 GB used |
|
|
04/02/2025 07:25:16 - INFO - >> GPU 4: 73.44 GB used |
|
|
04/02/2025 07:25:16 - INFO - >> GPU 5: 73.24 GB used |
|
|
04/02/2025 07:25:16 - INFO - >> Total GPU Memory Used: 447.68 GB |
|
|
04/02/2025 07:25:16 - INFO - >> Total GPU Power Consumption: 531.29 W |
|
|
04/02/2025 07:27:22 - INFO - β
Training completed in 0h 16m 48s |
|
|
|