kkail8 commited on
Commit
900c1f6
·
verified ·
1 Parent(s): 45a17fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -3
README.md CHANGED
@@ -1,3 +1,58 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ arxiv: 2503.23377
4
+ ---
5
+
6
+ ## <div align="center"> JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization</div>
7
+
8
+ <div align="center">
9
+
10
+ [[`HomePage`](https://javisdit.github.io/)]
11
+ [[`ArXiv Paper`](https://arxiv.org/pdf/2503.23377)]
12
+ [[`HF Paper`](https://huggingface.co/papers/2503.23377)]
13
+ [[`GitHub`](https://github.com/JavisDiT/JavisDiT/)]
14
+ [[`Models`](https://huggingface.co/collections/JavisDiT/javisdit-v01-67f2ac8a0def71591f7e2974)]
15
+ <!-- [[`Gradio Demo`](https://447c629bc8648ce599.gradio.live)] -->
16
+
17
+ </div>
18
+
19
+
20
+ We introduce **JavisDiT**, a novel & SoTA Joint Audio-Video Diffusion Transformer designed for synchronized audio-video generation (JAVG) from open-ended user prompts.
21
+
22
+
23
+ ## 📰 News
24
+
25
+ - **[2025.08.11]** 🔥 We released the data and code for JAVG evaluation. For more details refer to [here](#evaluation) and [eval/javisbench/README.md](eval/javisbench/README.md).
26
+ - **[2025.04.15]** 🔥 We released the data preparation and model training instructions. You can train JavisDiT with your own dataset!
27
+ - **[2025.04.07]** 🔥 We released the inference code and a preview model of **JavisDiT-v0.1** at [HuggingFace](https://huggingface.co/JavisDiT), which includes **JavisDiT-v0.1-audio**, **JavisDiT-v0.1-prior**, and **JavisDiT-v0.1-jav** (with a [low-resolution version](https://huggingface.co/JavisDiT/JavisDiT-v0.1-jav-240p4s) and a [full-resolution version](https://huggingface.co/JavisDiT/JavisDiT-v0.1-jav)).
28
+ - **[2025.04.03]** We release the repository of [JavisDiT](https://arxiv.org/pdf/2503.23377). Code, model, and data are coming soon.
29
+
30
+ ### 👉 TODO
31
+ - [ ] Release the data and evaluation code for JavisScore.
32
+ - [ ] Deriving a more efficient and powerful JAVG model.
33
+
34
+ ## Brief Introduction
35
+
36
+ **JavisDiT** addresses the key bottleneck of JAVG with Hierarchical Spatio-Temporal Prior Synchronization.
37
+
38
+ - We introduce **JavisDiT**, a novel Joint Audio-Video Diffusion Transformer designed for synchronized audio-video generation (JAVG) from open-ended user prompts.
39
+ - We propose **JavisBench**, a new benchmark consisting of 10,140 high-quality text-captioned sounding videos spanning diverse scenes and complex real-world scenarios.
40
+ - We devise **JavisScore**, a robust metric for evaluating the synchronization between generated audio-video pairs in real-world complex content.
41
+ - We curate **JavisEval**, a dataset with 3,000 human-annotated samples to quantitatively evaluate the accuracy of synchronization estimate metrics.
42
+
43
+ We hope to set a new standard for the JAVG community. For more technical details, kindly refer to the original [paper](https://arxiv.org/pdf/2503.23377.pdf).
44
+
45
+ ## Citation
46
+
47
+ If you find JavisDiT is useful and use it in your project, please kindly cite:
48
+
49
+ ```bibtex
50
+ @inproceedings{liu2025javisdit,
51
+ title={JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization},
52
+ author={Kai Liu and Wei Li and Lai Chen and Shengqiong Wu and Yanhao Zheng and Jiayi Ji and Fan Zhou and Rongxin Jiang and Jiebo Luo and Hao Fei and Tat-Seng Chua},
53
+ booktitle={arxiv},
54
+ year={2025},
55
+ eprint={2503.23377},
56
+ }
57
+ ```
58
+