Skip to content

Latest commit

 

History

History
148 lines (119 loc) · 8.56 KB

README.md

File metadata and controls

148 lines (119 loc) · 8.56 KB

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

video_LeviTor.mp4

Hanlin Wang1,2, Hao Ouyang2, Qiuyu Wang2, Wen Wang3,2, Ka Leong Cheng4,2, Qifeng Chen4, Yujun Shen2, Limin Wang†,1
1State Key Laboratory for Novel Software Technology, Nanjing University
2Ant Group 3Zhejiang University 4The Hong Kong University of Science and Technology
corresponding author

TODO List

  • Release gradio demo on huggingface.

Update Log

  • [2024.12.20] 🎉 Exciting News: Interactive demo with gradio for LeviTor has been released!

Setup

Follow the following guide to set up the environment.

  1. Git clone repo

    git clone https://github.com/qiuyu96/LeviTor.git
    cd LeviTor
    
  2. Download and unzip checkpoints
    Creat checkpoints dir:

     mkdir checkpoints
     cd checkpoints
    

    Download 'depth_anything_v2_vitl.pth' from Depth Anything V2
    Download 'sam_vit_h_4b8939.pth' from Segment Anything
    Download 'stable-video-diffusion-img2vid-xt' from stabilityai
    Create LeviTor checkpoint directory:

    mkdir LeviTor
    cd LeviTor
    

    Then download LeviTor checkpoint from LeviTor

    Ensure all the checkpoints are in the checkpoints directory as:

    checkpoints/
     |-- sam_vit_h_4b8939.pth
     |-- depth_anything_v2_vitl.pth
     |-- stable-video-diffusion-img2vid-xt/
     |-- LeviTor/
         |-- random_states_0.pkl
         |-- scaler.pt
         |-- scheduler.bin
         |-- controlnet/
         |-- unet/
    
  3. Create environment

    conda create -n LeviTor python=3.9 -y
    conda activate LeviTor
    
  4. Install packages

    pip install -r requirements.txt
    
  5. Install pytorch3d

    pip install "git+https://github.com/facebookresearch/pytorch3d.git"
    
  6. Install gradio

    pip install gradio==4.36.1
    
  7. Run LeviTor

    python gradio_demo/gradio_run.py --frame_interval 1 --num_frames 16 --pretrained_model_name_or_path checkpoints/stable-video-diffusion-img2vid-xt --resume_from_checkpoint checkpoints/LeviTor --width 288 --height 512 --seed 217113 --mixed_precision fp16 --enable_xformers_memory_efficient_attention --output_dir ./outputs --gaussian_r 10 --sam_path checkpoints/sam_vit_h_4b8939.pth --depthanything_path checkpoints/depth_anything_v2_vitl.pth
    

Tutorial

Please read before you try!

I. Upload the start image

SVG image

Use the Upload Start Image to upload your image~

II. Select the area you want to operate

SVG image
Click Select Area with SAM button and then click on the image to select the area you want to operate with SAM.
Note that if the current point you click on the image is in your interested area, input 1 in the Add SAM Point? box. Otherwise, input 0 to click a point in your uninterested area. Use the Add SAM Point? box to accurately selected the area you want.

III. Draw a 3D trajectory and run!


SVG image
Click the Add New Drag Trajectory button to draw a 2D trajectory by clicking a series of points. Then you can see the depth values of your clicked points in the Depths for reference box. You can refer to this to determine your later input depth values.

SVG image
Then in the Input depth values here box, input your depth control values. All the depth values should be normalized to the range of 0 to 1. The smaller value is, the nearer the area moves. The number of input depth values input should be the same with values in Depths for reference box. Note that the depth values you input is relative depth, so the depths do not matter, but the changes of depth values determine how near or how far the object moves.

SVG image
After that, input a ratio value in the Input number of points for inference here to select how many control points will be used for inference. A small ratio is beneficial for non-rigid body motion, while a large ratio is suitable for rigid body motion.

SVG image
Finally, click the Run button to generate you video!

IV. Others


SVG image
You can also draw multiple trajectories to generate your video.

SVG image
To generate some orbiting effects, one suggestion is to select a reference object and create a primarily stationary path for it. This allows you to set a relative depth value, making it easier to adjust the depth changes for objects orbiting around it. For example, in the image above, we set the mountain depth value to 0.1. Then, based on the position of the points, we adjust the planetary depth variation. A value greater than 0.1 indicates that the object is behind the mountain, while a value less than 0.1 indicates that it has moved in front of the mountain, achieving a surrounding effect.

Citation

Don't forget to cite this source if it proves useful in your research!

@article{wang2024levitor, 
	title={LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis}, 
	author={Hanlin Wang and Hao Ouyang and Qiuyu Wang and Wen Wang and Ka Leong Cheng and Qifeng Chen and Yujun Shen and Limin Wang}, 
	year={2024}, 
	eprint={2412.15214}, 
	archivePrefix={arXiv}, 
	primaryClass={cs.CV}}

Acknowledgement

Our implementation is based on

Thanks for their remarkable contribution and released code!

Note

Note: This repo is governed by the license of Apache 2.0 We strongly advise users not to knowingly generate or allow others to knowingly generate harmful content, including hate speech, violence, pornography, deception, etc.

(注:本仓库受Apache 2.0的许可协议限制。我们强烈建议,用户不应传播及不应允许他人传播以下内容,包括但不限于仇恨言论、暴力、色情、欺诈相关的有害信息。)