Update README.md
Browse files
README.md
CHANGED
|
@@ -137,7 +137,84 @@ python train/inference.py \
|
|
| 137 |
|
| 138 |
**Output**: The generated videos will be saved in the `wanx/outputs/` directory.
|
| 139 |
|
|
|
|
| 140 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
|
| 142 |
## 📊 Project Structure
|
| 143 |
|
|
|
|
| 137 |
|
| 138 |
**Output**: The generated videos will be saved in the `wanx/outputs/` directory.
|
| 139 |
|
| 140 |
+
## 🔧 Training Process
|
| 141 |
|
| 142 |
+
### Step 1: Prompt Preprocessing
|
| 143 |
+
|
| 144 |
+
Before training, you need to preprocess the text prompts to generate embeddings.
|
| 145 |
+
|
| 146 |
+
#### CogVideoX Preprocessing
|
| 147 |
+
|
| 148 |
+
```bash
|
| 149 |
+
cd utils
|
| 150 |
+
python process_prompts_cogvideox.py \
|
| 151 |
+
--input_file your_prompts.txt \
|
| 152 |
+
--output_dir ../cogvideox/prompts \
|
| 153 |
+
--model_path ../cogvideox/CogVideoX-5b \
|
| 154 |
+
--batch_size 32 \
|
| 155 |
+
--save_separate
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
**Argument Descriptions**:
|
| 159 |
+
|
| 160 |
+
- `--input_file`: A `.txt` file containing prompts, with one prompt per line.
|
| 161 |
+
- `--output_dir`: The directory to save the output embeddings.
|
| 162 |
+
- `--model_path`: Path to the CogVideoX model.
|
| 163 |
+
- `--batch_size`: The batch size for processing.
|
| 164 |
+
- `--save_separate`: Whether to save each embedding as a separate file.
|
| 165 |
+
|
| 166 |
+
#### WanX Preprocessing
|
| 167 |
+
|
| 168 |
+
```bash
|
| 169 |
+
cd utils
|
| 170 |
+
python process_prompts_wanx.py
|
| 171 |
+
```
|
| 172 |
+
|
| 173 |
+
This script will automatically process the prompts in `utils/all_dimension_aug_wanx.txt` and generate the corresponding embeddings.
|
| 174 |
+
|
| 175 |
+
### Step 2: Start Training
|
| 176 |
+
|
| 177 |
+
#### CogVideoX Training
|
| 178 |
+
|
| 179 |
+
```bash
|
| 180 |
+
cd cogvideox
|
| 181 |
+
bash train_tdm_1.sh
|
| 182 |
+
```
|
| 183 |
+
|
| 184 |
+
**Core Training Parameters**:
|
| 185 |
+
|
| 186 |
+
```bash
|
| 187 |
+
# If not training with 8 GPUs, you must modify CUDA_VISIBLE_DEVICES and the num_processes in config.yaml
|
| 188 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8 accelerate launch \
|
| 189 |
+
--config_file train/config.yaml \
|
| 190 |
+
train/train_cogvideo_tdm.py \
|
| 191 |
+
--pretrained_model_name_or_path CogVideoX-5b \ # Path to the base model
|
| 192 |
+
--mixed_precision bf16 \ # Use mixed-precision for reduced memory usage
|
| 193 |
+
--train_batch_size 5 \ # Training batch size
|
| 194 |
+
--gradient_accumulation_steps 4 \ # Number of gradient accumulation steps
|
| 195 |
+
--learning_rate 1e-4 \ # Learning rate for the student model
|
| 196 |
+
--learning_rate_g 1e-4 \
|
| 197 |
+
--learning_rate_fake 5e-4 \ # Learning rate for the fake model
|
| 198 |
+
--lambda_reg 0.5 \ # Regularization weight
|
| 199 |
+
--k_step 8 \ # Target number of steps for distillation
|
| 200 |
+
--cfg 3.5 \ # Classifier-Free Guidance scale
|
| 201 |
+
--eta 0.9 \ # ETA parameter for DDIM
|
| 202 |
+
--use_sparsity true \ # Enable sparse attention
|
| 203 |
+
--rank 64 \
|
| 204 |
+
--lora_alpha 64 \ # LoRA configuration
|
| 205 |
+
--max_train_steps 300 \ # Maximum number of training steps
|
| 206 |
+
--checkpointing_steps 15 \ # Interval for saving checkpoints
|
| 207 |
+
--gradient_checkpointing \ # Use gradient checkpointing to save memory
|
| 208 |
+
--enable_slicing \
|
| 209 |
+
--enable_tiling # VAE memory optimization
|
| 210 |
+
```
|
| 211 |
+
|
| 212 |
+
#### WanX Training
|
| 213 |
+
|
| 214 |
+
```bash
|
| 215 |
+
cd wanx
|
| 216 |
+
bash train_wanx_tdm.sh
|
| 217 |
+
```
|
| 218 |
|
| 219 |
## 📊 Project Structure
|
| 220 |
|