GYP666
/

BLADE

Model card Files Files and versions

xet

Community

GYP666 commited on Aug 14

Commit

1deef16

verified ·

1 Parent(s): d9dce4d

Update README.md

Browse files

Files changed (1) hide show

README.md +204 -1

README.md CHANGED Viewed

@@ -1,4 +1,207 @@
 ---
 license: apache-2.0
 ---
-This repository contains the official model weights for Video-BLADE, a framework for efficient video generation achieved through the joint training of adaptive sparse attention and step distillation.

 ---
 license: apache-2.0
 ---
+# Video-BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
+<div align="center">
+[📖 Paper](https://tacossp.github.io/BLADE-Homepage/) | [🚀 Homepage](https://www.google.com/search?q=%23-quick-start) | [💾 Models](https://huggingface.co/GYP666/VIDEO-BLADE) | [📖 中文阅读](README_zh.md)
+</div>
+Video-BLADE is a data-free framework for efficient video generation. By jointly training an adaptive sparse attention mechanism with a step distillation technique, it achieves a significant acceleration in video generation models. This project combines a block-sparse attention mechanism with step distillation, reducing the number of inference steps from 50 to just 8 while maintaining high-quality generation.
+## 📢 News
+  - **[Aug 2024]** 🎉 The code and pre-trained models for Video-BLADE have been released\!
+  - **[Aug 2024]** 📝 Support for two mainstream video generation models, CogVideoX-5B and WanX-1.3B, is now available.
+  - **[Aug 2024]** ⚡ Achieved high-quality video generation in just 8 steps, a significant speedup compared to the 50-step baseline.
+## ✨ Key Features
+  - 🚀 **Efficient Inference**: Reduces the number of inference steps from 50 to 8 while preserving generation quality.
+  - 🎯 **Adaptive Sparse Attention**: Employs a block-sparse attention mechanism to significantly reduce computational complexity.
+  - 📈 **Step Distillation**: Utilizes the Trajectory Distillation Method (TDM), enabling training without the need for video data.
+  - 🎮 **Plug-and-Play**: Supports CogVideoX-5B and WanX-1.3B models without requiring modifications to their original architectures.
+## 🛠️ Environment Setup
+### System Requirements
+  - Python \>= 3.11 (Recommended)
+  - CUDA \>= 11.6 (Recommended)
+  - GPU Memory \>= 24GB (for Inference)
+  - GPU Memory \>= 80GB (for Training)
+### Installation Steps
+1.  **Clone the repository**
+    ```bash
+    git clone https://github.com/Tacossp/VIDEO-BLADE
+    cd VIDEO-BLADE
+    ```
+2.  **Install dependencies**
+    ```bash
+    # Install using uv (Recommended)
+    uv pip install -r requirements.txt
+    # Or use pip
+    pip install -r requirements.txt
+    ```
+3.  **Compile the Block-Sparse-Attention library**
+    ```bash
+    git clone https://github.com/mit-han-lab/Block-Sparse-Attention.git
+    cd Block-Sparse-Attention
+    pip install packaging
+    pip install ninja
+    python setup.py install
+    cd ..
+    ```
+## 📥 Model Weights Download
+### Base Model Weights
+Please download the following base model weights and place them in the specified directories:
+1.  **CogVideoX-5B Model**
+    ```bash
+    # Download from Hugging Face
+    git lfs install
+    git clone https://huggingface.co/zai-org/CogVideoX-5b cogvideox/CogVideoX-5b
+    ```
+2.  **WanX-1.3B Model**
+    ```bash
+    # Download from Hugging Face
+    git clone https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers wanx/wan1.3b
+    ```
+### Pre-trained Video-BLADE Weights
+We provide pre-trained weights for Video-BLADE:
+```bash
+# Download pre-trained weights
+git clone https://huggingface.co/GYP666/VIDEO-BLADE pretrained_weights
+```
+### Weight Directory Structure
+Ensure your directory structure for weights is as follows:
+```
+VIDEO-BLADE/
+├── cogvideox/
+│   └── CogVideoX-5b/           # Base model weights for CogVideoX
+├── wanx/
+│   └── wan1.3b/               # Base model weights for WanX
+└── pretrained_weights/         # Pre-trained weights for Video-BLADE
+    ├── BLADE_cogvideox_weight/
+    └── BLADE_wanx_weight/
+```
+## 🚀 Quick Start - Inference
+### CogVideoX Inference
+```bash
+cd cogvideox
+python train/inference.py \
+    --lora_path ../pretrained_weights/cogvideox_checkpoints/your_checkpoint \
+    --gpu 0
+```
+**Argument Descriptions**:
+  - `--lora_path`: Path to the LoRA weights file.
+  - `--gpu`: The ID of the GPU device to use (Default: 0).
+**Output**: The generated videos will be saved in the `cogvideox/outputs/inference/` directory.
+### WanX Inference
+```bash
+cd wanx
+python train/inference.py \
+    --lora_path ../pretrained_weights/wanx_checkpoints/your_checkpoint \
+    --gpu 0
+```
+**Output**: The generated videos will be saved in the `wanx/outputs/` directory.
+## 📊 Project Structure
+```
+VIDEO-BLADE/
+├── README.md                   # Project documentation
+├── requirements.txt           # List of Python dependencies
+│
+├── cogvideox/                 # Code related to CogVideoX
+│   ├── CogVideoX-5b/         # Directory for base model weights
+│   ├── train/                # Training scripts
+│   │   ├── inference.py      # Inference script
+│   │   ├── train_cogvideo_tdm.py  # Training script
+│   │   ├── train_tdm_1.sh    # Script to launch training
+│   │   ├── modify_cogvideo.py # Model modification script
+│   │   └── config.yaml       # Training configuration file
+│   ├── prompts/              # Preprocessed prompts and embeddings
+│   └── outputs/              # Output from training and inference
+│
+├── wanx/                     # Code related to WanX
+│   ├── wan1.3b/             # Directory for base model weights
+│   ├── train/               # Training scripts
+│   │   ├── inference.py     # Inference script
+│   │   ├── train_wanx_tdm.py # Training script
+│   │   ├── train_wanx_tdm.sh # Script to launch training
+│   │   └── modify_wan.py    # Model modification script
+│   ├── prompts/             # Preprocessed prompts and embeddings
+│   └── outputs/             # Output from training and inference
+│
+├── utils/                   # Utility scripts
+│   ├── process_prompts_cogvideox.py  # Data preprocessing for CogVideoX
+│   ├── process_prompts_wanx.py       # Data preprocessing for WanX
+│   └── all_dimension_aug_wanx.txt    # Training prompts for WanX
+│
+├── Block-Sparse-Attention/  # Sparse attention library
+│   ├── setup.py            # Compilation and installation script
+│   ├── block_sparse_attn/  # Core library code
+│   └── README.md           # Library usage instructions
+│
+└── ds_config.json          # DeepSpeed configuration file
+```
+## 🤝 Acknowledgements
+  - [FlashAttention](https://github.com/Dao-AILab/flash-attention), [Block-Sparse-Attention](https://github.com/mit-han-lab/Block-Sparse-Attention): For the foundational work on sparse attention.
+  - [CogVideoX](https://github.com/THUDM/CogVideo), [Wan2.1](https://github.com/Wan-Video/Wan2.1): For the supported models.
+  - [TDM](https://www.google.com/search?q=https://github.com/Luo-Yihong/TDM): For the foundational work on distillation implementation.
+  - [Diffusers](https://github.com/huggingface/diffusers): For the invaluable diffusion models library.
+## 📄 Citation
+If you use Video-BLADE in your research, please cite our work:
+```bibtex
+@article{video-blade-2024,
+  title={Video-BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation},
+  author={},
+  year={2024}
+}
+```
+## 📧 Contact
+For any questions or suggestions, feel free to:
+  - Contact Youping Gu at youpgu71@gmail.com.
+  - Submit an issue on our [Github page](https://github.com/Tacossp/VIDEO-BLADE/issues).