--- license: mit pipeline_tag: image-to-video library_name: diffusers --- # DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation [![Page](https://img.shields.io/badge/github-Project_page-blue?logo=github)](https://soyouthinkyoucantell.github.io/dualcamctrl-page/) [![Paper](https://img.shields.io/badge/arXiv-Paper-b31b1b?logo=arxiv&logoColor=white)](https://www.arxiv.org/abs/2511.23127) [![Code](https://img.shields.io/badge/github-Code-blue?logo=github)](https://github.com/EnVision-Research/DualCamCtrl) [![Ckpt](https://img.shields.io/badge/๐Ÿค—%20HuggingFace-Checkpoint%20-yellow)](https://huggingface.co/FayeHongfeiZhang/DualCamCtrl/tree/main) [Hongfei Zhang](#) 1*, [Kanghao Chen](https://khao123.github.io/) 1,5*, [Zixin Zhang](https://scholar.google.com/citations?user=BbZ0mwoAAAAJ&hl=en) 1,5, [Harold H. Chen](https://haroldchen19.github.io/) 1,5, [Yuanhuiyi Lyu](https://qc-ly.github.io/) 1, [Yuqi Zhang](#) 3, [Shuai Yang](https://andysonys.github.io/) 1, [Kun Zhou](https://redrock303.github.io/) 4, [Ying-Cong Chen](https://www.yingcong.me/) 1,2,โœ‰ 1 HKUST(GZ) 2 HKUST 3 Fudan University 4 Shenzhen University 5 Knowin \* Equal Contribution. โœ‰Corresponding author. ## ๐Ÿงฉ Contents #### 1. [๐Ÿ“ฐ News](#๐Ÿ“ฐ-news) #### 2. [โš™๏ธ TODO](#โš™๏ธ-todo) #### 3. [๐ŸŽฏ Overview](#๐ŸŽฏ-overview) #### 4. [๐Ÿ”ง Installation](#๐Ÿ”ง-installation) #### 5. [๐Ÿ”ฎ Inference](#๐Ÿ”ฎ-inference) #### 6. [๐Ÿ”ฅ Training](#๐Ÿ”ฅ-training) ## ๐Ÿ“ฐ News โœ… 2025.11 โ€” Released inference pipeline & demo dataset โœ”๏ธ โœ… 2025.11 โ€” Uploaded official DualCamCtrl checkpoints to HuggingFace ๐Ÿ”‘ ## โš™๏ธ TODO โฌœ Release the training code ๐Ÿš€ ## ๐ŸŽฏ Overview ### Abstract This paper presents **DualCamCtrl**, a novel end-to-end diffusion model for camera-controlled video generation. Recent works have advanced this field by representing camera poses as ray-based conditions, yet they often lack sufficient scene understanding and geometric awareness. **DualCamCtrl** specifically targets this limitation by introducing a dual-branch framework that mutually generates camera-consistent RGB and depth sequences. To harmonize these two modalities, we further propose the **S**emant**I**c **G**uided **M**utual **A**lignment (SIGMA) mechanism, which performs RGBโ€“depth fusion in a semantics-guided and mutually reinforced manner. These designs collectively enable **DualCamCtrl** to better disentangle appearance and geometry modeling, generating videos that more faithfully adhere to the specified camera trajectories. Extensive experiments demonstrate that **DualCamCtrl** achieves more consistent camera-controlled video generation **with over 40% reduction** on camera motion errors compared with prior methods. ### Results ![I2V Quantitative Comparison](i2vcompare_cropped.jpg) *Comparison between our method and other state-of-the-art approaches. Given the same camera pose and input image as generation conditions, our method achieves the best alignment between camera motion and scene dynamics, producing the most visually accurate video. The โ€™+โ€™ signs marked in the figure serve as anchors for visual comparison.* ![T2V Quantitative Comparison](i2v.png) *Quantitative comparisons on **I2V** setting. โ†‘ / โ†“ denotes higher/lower is better. Best and second best results highlighted.* ![I2V/T2V Comparison](t2v.png) *Quantitative comparisons on **T2V** setting across REALESTATE10K and DL3DV.* ## ๐Ÿ”ง Installation #### Clone repo and create an enviroment with Python 3.11: ``` git clone https://github.com/soyouthinkyoucantell/DualCamCtrl.git conda create -n dualcamctrl python=3.11 conda activate dualcamctrl ``` #### Install DiffSynth-Studio dependencies from source code: ``` cd DualCamCtrl pip install -e . ``` #### Then install GenFusion dependencies: ``` mkdir dependency cd dependency git clone https://github.com/rmbrualla/pycolmap.git cd pycolmap pip install -e . pip install numpy==1.26.4 peft accelerate==1.9.0 decord==0.6.0 deepspeed diffusers omegaconf ``` ## ๐Ÿ”ฎ Inference ### Checkpoints Get the checkpoints from the HuggingFace repo:  [DualCamCtrl Checkpoints](https://huggingface.co/FayeHongfeiZhang/DualCamCtrl) #### Put it the checkpoints dir ``` cd ../.. # make sure you are at the root dir ``` Your project structure should be like ``` DualCamCtrl/ โ”œโ”€โ”€ checkpoints/ # โ† Put downloaded .pt here โ”‚ โ””โ”€โ”€ dualcamctrl_diffusion_transformer.pt โ”œโ”€โ”€ demo_dataset/ # Small demo dataset strcture โ”œโ”€โ”€ demo_pic/ # Demo images for quick inference โ”œโ”€โ”€ diffsynth/ โ”œโ”€โ”€ examples/ โ”œโ”€โ”€ .... โ”œโ”€โ”€ requirements.txt โ”œโ”€โ”€ README.md โ””โ”€โ”€ setup.py ``` #### Test with our demo pictures and depth: ``` cd .. # make sure you are at the root dir export PYTHONPATH=. python -m test_script.test_demo ``` ## ๐Ÿ”ฅ Training ### Training details coming soonโ€ฆ Stay tuned! ๐Ÿš€