GYP666 commited on
Commit
1deef16
ยท
verified ยท
1 Parent(s): d9dce4d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +204 -1
README.md CHANGED
@@ -1,4 +1,207 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- This repository contains the official model weights for Video-BLADE, a framework for efficient video generation achieved through the joint training of adaptive sparse attention and step distillation.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # Video-BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
5
+
6
+ <div align="center">
7
+
8
+ [๐Ÿ“– Paper](https://tacossp.github.io/BLADE-Homepage/) | [๐Ÿš€ Homepage](https://www.google.com/search?q=%23-quick-start) | [๐Ÿ’พ Models](https://huggingface.co/GYP666/VIDEO-BLADE) | [๐Ÿ“– ไธญๆ–‡้˜…่ฏป](README_zh.md)
9
+
10
+ </div>
11
+
12
+ Video-BLADE is a data-free framework for efficient video generation. By jointly training an adaptive sparse attention mechanism with a step distillation technique, it achieves a significant acceleration in video generation models. This project combines a block-sparse attention mechanism with step distillation, reducing the number of inference steps from 50 to just 8 while maintaining high-quality generation.
13
+
14
+ ## ๐Ÿ“ข News
15
+
16
+ - **[Aug 2024]** ๐ŸŽ‰ The code and pre-trained models for Video-BLADE have been released\!
17
+ - **[Aug 2024]** ๐Ÿ“ Support for two mainstream video generation models, CogVideoX-5B and WanX-1.3B, is now available.
18
+ - **[Aug 2024]** โšก Achieved high-quality video generation in just 8 steps, a significant speedup compared to the 50-step baseline.
19
+
20
+ ## โœจ Key Features
21
+
22
+ - ๐Ÿš€ **Efficient Inference**: Reduces the number of inference steps from 50 to 8 while preserving generation quality.
23
+ - ๐ŸŽฏ **Adaptive Sparse Attention**: Employs a block-sparse attention mechanism to significantly reduce computational complexity.
24
+ - ๐Ÿ“ˆ **Step Distillation**: Utilizes the Trajectory Distillation Method (TDM), enabling training without the need for video data.
25
+ - ๐ŸŽฎ **Plug-and-Play**: Supports CogVideoX-5B and WanX-1.3B models without requiring modifications to their original architectures.
26
+
27
+ ## ๐Ÿ› ๏ธ Environment Setup
28
+
29
+ ### System Requirements
30
+
31
+ - Python \>= 3.11 (Recommended)
32
+ - CUDA \>= 11.6 (Recommended)
33
+ - GPU Memory \>= 24GB (for Inference)
34
+ - GPU Memory \>= 80GB (for Training)
35
+
36
+ ### Installation Steps
37
+
38
+ 1. **Clone the repository**
39
+
40
+ ```bash
41
+ git clone https://github.com/Tacossp/VIDEO-BLADE
42
+ cd VIDEO-BLADE
43
+ ```
44
+
45
+ 2. **Install dependencies**
46
+
47
+ ```bash
48
+ # Install using uv (Recommended)
49
+ uv pip install -r requirements.txt
50
+
51
+ # Or use pip
52
+ pip install -r requirements.txt
53
+ ```
54
+
55
+ 3. **Compile the Block-Sparse-Attention library**
56
+
57
+ ```bash
58
+ git clone https://github.com/mit-han-lab/Block-Sparse-Attention.git
59
+ cd Block-Sparse-Attention
60
+ pip install packaging
61
+ pip install ninja
62
+ python setup.py install
63
+ cd ..
64
+ ```
65
+
66
+ ## ๐Ÿ“ฅ Model Weights Download
67
+
68
+ ### Base Model Weights
69
+
70
+ Please download the following base model weights and place them in the specified directories:
71
+
72
+ 1. **CogVideoX-5B Model**
73
+
74
+ ```bash
75
+ # Download from Hugging Face
76
+ git lfs install
77
+ git clone https://huggingface.co/zai-org/CogVideoX-5b cogvideox/CogVideoX-5b
78
+ ```
79
+
80
+ 2. **WanX-1.3B Model**
81
+
82
+ ```bash
83
+ # Download from Hugging Face
84
+ git clone https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers wanx/wan1.3b
85
+ ```
86
+
87
+ ### Pre-trained Video-BLADE Weights
88
+
89
+ We provide pre-trained weights for Video-BLADE:
90
+
91
+ ```bash
92
+ # Download pre-trained weights
93
+ git clone https://huggingface.co/GYP666/VIDEO-BLADE pretrained_weights
94
+ ```
95
+
96
+ ### Weight Directory Structure
97
+
98
+ Ensure your directory structure for weights is as follows:
99
+
100
+ ```
101
+ VIDEO-BLADE/
102
+ โ”œโ”€โ”€ cogvideox/
103
+ โ”‚ โ””โ”€โ”€ CogVideoX-5b/ # Base model weights for CogVideoX
104
+ โ”œโ”€โ”€ wanx/
105
+ โ”‚ โ””โ”€โ”€ wan1.3b/ # Base model weights for WanX
106
+ โ””โ”€โ”€ pretrained_weights/ # Pre-trained weights for Video-BLADE
107
+ โ”œโ”€โ”€ BLADE_cogvideox_weight/
108
+ โ””โ”€โ”€ BLADE_wanx_weight/
109
+ ```
110
+
111
+ ## ๐Ÿš€ Quick Start - Inference
112
+
113
+ ### CogVideoX Inference
114
+
115
+ ```bash
116
+ cd cogvideox
117
+ python train/inference.py \
118
+ --lora_path ../pretrained_weights/cogvideox_checkpoints/your_checkpoint \
119
+ --gpu 0
120
+ ```
121
+
122
+ **Argument Descriptions**:
123
+
124
+ - `--lora_path`: Path to the LoRA weights file.
125
+ - `--gpu`: The ID of the GPU device to use (Default: 0).
126
+
127
+ **Output**: The generated videos will be saved in the `cogvideox/outputs/inference/` directory.
128
+
129
+ ### WanX Inference
130
+
131
+ ```bash
132
+ cd wanx
133
+ python train/inference.py \
134
+ --lora_path ../pretrained_weights/wanx_checkpoints/your_checkpoint \
135
+ --gpu 0
136
+ ```
137
+
138
+ **Output**: The generated videos will be saved in the `wanx/outputs/` directory.
139
+
140
+
141
+
142
+ ## ๐Ÿ“Š Project Structure
143
+
144
+ ```
145
+ VIDEO-BLADE/
146
+ โ”œโ”€โ”€ README.md # Project documentation
147
+ โ”œโ”€โ”€ requirements.txt # List of Python dependencies
148
+ โ”‚
149
+ โ”œโ”€โ”€ cogvideox/ # Code related to CogVideoX
150
+ โ”‚ โ”œโ”€โ”€ CogVideoX-5b/ # Directory for base model weights
151
+ โ”‚ โ”œโ”€โ”€ train/ # Training scripts
152
+ โ”‚ โ”‚ โ”œโ”€โ”€ inference.py # Inference script
153
+ โ”‚ โ”‚ โ”œโ”€โ”€ train_cogvideo_tdm.py # Training script
154
+ โ”‚ โ”‚ โ”œโ”€โ”€ train_tdm_1.sh # Script to launch training
155
+ โ”‚ โ”‚ โ”œโ”€โ”€ modify_cogvideo.py # Model modification script
156
+ โ”‚ โ”‚ โ””โ”€โ”€ config.yaml # Training configuration file
157
+ โ”‚ โ”œโ”€โ”€ prompts/ # Preprocessed prompts and embeddings
158
+ โ”‚ โ””โ”€โ”€ outputs/ # Output from training and inference
159
+ โ”‚
160
+ โ”œโ”€โ”€ wanx/ # Code related to WanX
161
+ โ”‚ โ”œโ”€โ”€ wan1.3b/ # Directory for base model weights
162
+ โ”‚ โ”œโ”€โ”€ train/ # Training scripts
163
+ โ”‚ โ”‚ โ”œโ”€โ”€ inference.py # Inference script
164
+ โ”‚ โ”‚ โ”œโ”€โ”€ train_wanx_tdm.py # Training script
165
+ โ”‚ โ”‚ โ”œโ”€โ”€ train_wanx_tdm.sh # Script to launch training
166
+ โ”‚ โ”‚ โ””โ”€โ”€ modify_wan.py # Model modification script
167
+ โ”‚ โ”œโ”€โ”€ prompts/ # Preprocessed prompts and embeddings
168
+ โ”‚ โ””โ”€โ”€ outputs/ # Output from training and inference
169
+ โ”‚
170
+ โ”œโ”€โ”€ utils/ # Utility scripts
171
+ โ”‚ โ”œโ”€โ”€ process_prompts_cogvideox.py # Data preprocessing for CogVideoX
172
+ โ”‚ โ”œโ”€โ”€ process_prompts_wanx.py # Data preprocessing for WanX
173
+ โ”‚ โ””โ”€โ”€ all_dimension_aug_wanx.txt # Training prompts for WanX
174
+ โ”‚
175
+ โ”œโ”€โ”€ Block-Sparse-Attention/ # Sparse attention library
176
+ โ”‚ โ”œโ”€โ”€ setup.py # Compilation and installation script
177
+ โ”‚ โ”œโ”€โ”€ block_sparse_attn/ # Core library code
178
+ โ”‚ โ””โ”€โ”€ README.md # Library usage instructions
179
+ โ”‚
180
+ โ””โ”€โ”€ ds_config.json # DeepSpeed configuration file
181
+ ```
182
+
183
+ ## ๐Ÿค Acknowledgements
184
+
185
+ - [FlashAttention](https://github.com/Dao-AILab/flash-attention), [Block-Sparse-Attention](https://github.com/mit-han-lab/Block-Sparse-Attention): For the foundational work on sparse attention.
186
+ - [CogVideoX](https://github.com/THUDM/CogVideo), [Wan2.1](https://github.com/Wan-Video/Wan2.1): For the supported models.
187
+ - [TDM](https://www.google.com/search?q=https://github.com/Luo-Yihong/TDM): For the foundational work on distillation implementation.
188
+ - [Diffusers](https://github.com/huggingface/diffusers): For the invaluable diffusion models library.
189
+
190
+ ## ๐Ÿ“„ Citation
191
+
192
+ If you use Video-BLADE in your research, please cite our work:
193
+
194
+ ```bibtex
195
+ @article{video-blade-2024,
196
+ title={Video-BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation},
197
+ author={},
198
+ year={2024}
199
+ }
200
+ ```
201
+
202
+ ## ๐Ÿ“ง Contact
203
+
204
+ For any questions or suggestions, feel free to:
205
+
206
+ - Contact Youping Gu at youpgu71@gmail.com.
207
+ - Submit an issue on our [Github page](https://github.com/Tacossp/VIDEO-BLADE/issues).