lizhaoyang
commited on
Commit
Β·
814246b
1
Parent(s):
d4bf1d3
Update README
Browse files
README.md
CHANGED
|
@@ -48,7 +48,8 @@ license: apache-2.0
|
|
| 48 |
|
| 49 |
## π Overview
|
| 50 |
BindWeave is a unified subject-consistent video generation framework for single- and multi-subject prompts, built on an MLLM-DiT architecture that couples a pretrained multimodal large language model with a diffusion transformer.
|
| 51 |
-
It achieves cross-modal integration via entity grounding and representation alignment, leveraging the MLLM to parse complex prompts and produce subject-aware hidden states that condition the DiT for high-fidelity generation.
|
|
|
|
| 52 |
|
| 53 |
### OpenS2V-Eval Performance π
|
| 54 |
BindWeave achieves a solid score of 57.61 on the [OpenS2V-Eval](https://huggingface.co/spaces/BestWishYsh/OpenS2V-Eval) benchmark, highlighting its robust capabilities across multiple evaluation dimensions and demonstrating competitive performance against several leading open-source and commercial systems.
|
|
|
|
| 48 |
|
| 49 |
## π Overview
|
| 50 |
BindWeave is a unified subject-consistent video generation framework for single- and multi-subject prompts, built on an MLLM-DiT architecture that couples a pretrained multimodal large language model with a diffusion transformer.
|
| 51 |
+
It achieves cross-modal integration via entity grounding and representation alignment, leveraging the MLLM to parse complex prompts and produce subject-aware hidden states that condition the DiT for high-fidelity generation. For more details or tutorials refer to [ByteDance/BindWeave](https://github.com/bytedance/BindWeave)
|
| 52 |
+
|
| 53 |
|
| 54 |
### OpenS2V-Eval Performance π
|
| 55 |
BindWeave achieves a solid score of 57.61 on the [OpenS2V-Eval](https://huggingface.co/spaces/BestWishYsh/OpenS2V-Eval) benchmark, highlighting its robust capabilities across multiple evaluation dimensions and demonstrating competitive performance against several leading open-source and commercial systems.
|