Skip to content

snap-research/MSRVTT-Personalization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

MSRVTT-Personalization

Multi-subject Open-set Personalization in Video Generation
Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Yuwei Fang, Kwot Sin Lee, Ivan Skorokhodov, Kfir Aberman, Jun-Yan Zhu, Ming-Hsuan Yang, Sergey Tulyakov

arXiv Project Page

In this paper, we introduce MSRVTT-Personalization, a new benchmark for the task of personalization. It aims at accurate subject fidelity assessment and supports various conditioning modes, including conditioning on face crops, single or multiple arbitrary subjects, and the combination of foreground objects and background.

We include the testing dataset and evaluation protocol in this repository. We show a test sample of MSRVTT-Personalization below:

Ground Truth Video Personalization Annotations
**We will remove video samples from Github / project webpage / technical presentation as long as you need it. Please contact tsaishienchen at gmail dot com for the request.

Leaderboard

  • MSRVTT-Personalization evaluates a model across five metrics:

    • Text similarity (Text-S)
    • Video similarity (Vid-S)
    • Subject similarity (Subj-S)
    • Face similarity (Face-S)
    • Dynamic degree (Dync-D)
  • Quantitative evaluation:

    • Subject mode of MSRVTT-Personalization (condition on an entire subject image)

      Method Text-S Vid-S Subj-S Dync-D
      ELITE 0.245 0.620 0.359 -
      VideoBooth 0.222 0.612 0.395 0.448
      DreamVideo 0.261 0.611 0.310 0.311
      Video Alchemist 0.269 0.732 0.617 0.466
    • Face mode of MSRVTT-Personalization (condition on a face crop image)

      Method Text-S Vid-S Face-S Dync-D
      IP-Adapter 0.251 0.648 0.269 -
      PhotoMaker 0.278 0.569 0.189 -
      Magic-Me 0.251 0.602 0.135 0.418
      Video Alchemist 0.273 0.687 0.382 0.424
  • Qualitative evaluation:

Evaluation Protocol

To add

Citation

If you find this project useful for your research, please cite our paper. 😊

@inproceedings{chen2025videoalchemist,
  title   = {Multi-subject Open-set Personalization in Video Generation},
  author  = {Chen, Tsai-Shien and Siarohin, Aliaksandr and Menapace, Willi and Fang, Yuwei and Lee, Kwot Sin and Skorokhodov, Ivan and Aberman, Kfir and Zhu, Jun-Yan and Yang, Ming-Hsuan and Tulyakov, Sergey},
  journal = {arXiv preprint arXiv:2501.06187},
  year    = {2025}
}

Contact Information

Tsai-Shien Chen: [email protected]

About

Benchmark dataset and code of MSRVTT-Personalization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published