2025-08-29 11:12:31,126 - __main__ - INFO - 📊 Configuration: 2025-08-29 11:12:31,127 - __main__ - INFO - Model: VLSP2025-LegalSML/qwen3-1.7b-legal-pretrain 2025-08-29 11:12:31,127 - __main__ - INFO - Dataset: thangvip/combined-vietnamese-legal-qa-pretrain-tokenized-8k 2025-08-29 11:12:31,127 - __main__ - INFO - Training mode: Full parameter training 2025-08-29 11:12:31,127 - __main__ - INFO - Distributed strategy: DDP (DistributedDataParallel) 2025-08-29 11:12:31,127 - __main__ - INFO - Accelerator state: Distributed environment: DistributedType.NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Mixed precision type: bf16 2025-08-29 11:12:31,127 - __main__ - INFO - Number of processes: 1 2025-08-29 11:12:31,127 - __main__ - INFO - Device: cuda 2025-08-29 11:12:31,127 - __main__ - INFO - Mixed precision: bf16 2025-08-29 11:12:31,129 - __main__ - INFO - 📚 Loading tokenizer... 2025-08-29 11:12:32,275 - __main__ - INFO - 🔧 Loading model... 2025-08-29 11:12:33,886 - __main__ - INFO - 🔥 Full Parameter Training Enabled 2025-08-29 11:12:33,887 - __main__ - INFO - Total parameters: 1,720,574,976 2025-08-29 11:12:33,887 - __main__ - INFO - Trainable parameters: 1,720,574,976 2025-08-29 11:12:33,887 - __main__ - INFO - Trainable %: 100.00% 2025-08-29 11:12:33,890 - __main__ - INFO - 📊 Preparing dataset... 2025-08-29 11:12:39,895 - __main__ - INFO - Dataset size: 60238 training examples 2025-08-29 11:12:40,030 - __main__ - INFO - 🎯 Creating Trainer for preprocessed dataset... 2025-08-29 11:12:40,863 - __main__ - INFO - 🚂 Starting training with preprocessed dataset... 2025-08-30 10:12:16,001 - __main__ - INFO - 💾 Saving final model... 2025-08-30 10:12:37,953 - __main__ - INFO - 🚀 Pushing model to Hugging Face Hub: thangvip/qwen3-1.7b-legal-pretrain-synthetic-8k 2025-08-30 10:12:37,954 - __main__ - WARNING - ❌ HF_TOKEN not found. Cannot push to hub. 2025-08-30 10:12:37,954 - __main__ - INFO - ✅ Modal training completed successfully! 2025-08-30 10:12:37,954 - __main__ - INFO - 📊 TensorBoard logs: ./data/outputs/qwen3-1.7b-legal-pretrain-synthetic-8k/logs 2025-08-30 10:12:37,954 - __main__ - INFO - 📝 Training logs: ./data/outputs/qwen3-1.7b-legal-pretrain-synthetic-8k/training.log 2025-08-30 22:56:07,048 - __main__ - INFO - 📊 Configuration: 2025-08-30 22:56:07,048 - __main__ - INFO - Model: VLSP2025-LegalSML/qwen3-1.7b-legal-pretrain 2025-08-30 22:56:07,048 - __main__ - INFO - Dataset: thangvip/combined-vietnamese-legal-qa-pretrain-tokenized-8k 2025-08-30 22:56:07,048 - __main__ - INFO - Training mode: Full parameter training 2025-08-30 22:56:07,048 - __main__ - INFO - Distributed strategy: DDP (DistributedDataParallel) 2025-08-30 22:56:07,048 - __main__ - INFO - Accelerator state: Distributed environment: DistributedType.NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Mixed precision type: bf16 2025-08-30 22:56:07,048 - __main__ - INFO - Number of processes: 1 2025-08-30 22:56:07,048 - __main__ - INFO - Device: cuda 2025-08-30 22:56:07,048 - __main__ - INFO - Mixed precision: bf16 2025-08-30 22:56:07,053 - __main__ - INFO - 📚 Loading tokenizer... 2025-08-30 22:56:07,960 - __main__ - INFO - 🔧 Loading model... 2025-08-30 22:56:09,897 - __main__ - INFO - 🔥 Full Parameter Training Enabled 2025-08-30 22:56:09,897 - __main__ - INFO - Total parameters: 1,720,574,976 2025-08-30 22:56:09,897 - __main__ - INFO - Trainable parameters: 1,720,574,976 2025-08-30 22:56:09,897 - __main__ - INFO - Trainable %: 100.00% 2025-08-30 22:56:09,899 - __main__ - INFO - 📊 Preparing dataset... 2025-08-30 22:56:14,199 - __main__ - INFO - Dataset size: 60238 training examples 2025-08-30 22:56:14,355 - __main__ - INFO - 🎯 Creating Trainer for preprocessed dataset... 2025-08-30 22:56:29,238 - __main__ - INFO - 📊 Configuration: 2025-08-30 22:56:29,239 - __main__ - INFO - Model: VLSP2025-LegalSML/qwen3-1.7b-legal-pretrain 2025-08-30 22:56:29,239 - __main__ - INFO - Dataset: thangvip/combined-vietnamese-legal-qa-pretrain-tokenized-8k 2025-08-30 22:56:29,239 - __main__ - INFO - Training mode: Full parameter training 2025-08-30 22:56:29,239 - __main__ - INFO - Distributed strategy: DDP (DistributedDataParallel) 2025-08-30 22:56:29,239 - __main__ - INFO - Accelerator state: Distributed environment: DistributedType.NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Mixed precision type: bf16 2025-08-30 22:56:29,239 - __main__ - INFO - Number of processes: 1 2025-08-30 22:56:29,239 - __main__ - INFO - Device: cuda 2025-08-30 22:56:29,239 - __main__ - INFO - Mixed precision: bf16 2025-08-30 22:56:29,242 - __main__ - INFO - 📚 Loading tokenizer... 2025-08-30 22:56:30,073 - __main__ - INFO - 🔧 Loading model... 2025-08-30 22:56:31,400 - __main__ - INFO - 🔥 Full Parameter Training Enabled 2025-08-30 22:56:31,400 - __main__ - INFO - Total parameters: 1,720,574,976 2025-08-30 22:56:31,400 - __main__ - INFO - Trainable parameters: 1,720,574,976 2025-08-30 22:56:31,400 - __main__ - INFO - Trainable %: 100.00% 2025-08-30 22:56:31,402 - __main__ - INFO - 📊 Preparing dataset... 2025-08-30 22:56:37,577 - __main__ - INFO - Dataset size: 60238 training examples 2025-08-30 22:56:37,727 - __main__ - INFO - 🎯 Creating Trainer for preprocessed dataset... 2025-08-30 22:56:38,590 - __main__ - INFO - 🔄 Resuming from checkpoint: data/outputs/qwen3-1.7b-legal-pretrain-synthetic-8k/checkpoint-236 2025-08-30 22:56:38,590 - __main__ - INFO - 🚂 Starting training with preprocessed dataset... 2025-08-30 22:56:42,610 - __main__ - INFO - 💾 Saving final model... 2025-08-30 22:57:04,074 - __main__ - INFO - 🚀 Pushing model to Hugging Face Hub: thangvip/qwen3-1.7b-legal-pretrain-synthetic-8k