Skip to content

Championchess/phy-sd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

What Does Stable Diffusion Know about the 3D Scene? A General Protocol to Probe Large Vision Models for 3D Physical Understanding

This is the official implementation of the NeurIPS 2024 paper "What Does Stable Diffusion Know about the 3D Scene? A General Protocol to Probe Large Vision Models for 3D Physical Understanding" by Guanqi Zhan, Chuanxia Zheng, Weidi Xie, and Andrew Zisserman, including the dataset about the physical property introduced in the paper.

image1

Installation (Python 3.8.8 + Numpy 1.20.1 + PyTorch 1.13.1)

pip install pycocotools
pip install Pillow
pip install scipy
pip install -U scikit-learn
pip install ipdb
pip install scikit-image

Extract Stable Diffusion Feature

Clone the github https://github.com/Tsingularity/dift/tree/main, and put the files under dift/ of this github. Use dift/dift_sd.py in this github to replace src/models/dift_sd.py. Then fill in the paths and

python dift/extract_dift_depth.py

Download Original Datasets

For Same Plane and Perpendicular Plane: https://github.com/NVlabs/planercnn

For Material: https://github.com/apple/ml-dms-dataset

For Shadow: https://github.com/stevewongv/InstanceShadowDetection

For Occlusion: https://github.com/Championchess/A-Tri-Layer-Plugin-to-Improve-Occluded-Detection/tree/master and https://cocodataset.org/#home

For Support Relation and Depth: https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html

Download Our Datasets

image4 Row 1 for Occlusion and Row 2 for Depth

Depth

Train/Val/Test Image Names | Regions and Pairs

Train and Test Linear SVM

Depth

python SVM/depth_train_test_svm.py

Experiment Results

image5 image6 image7

Citation

Please cite our papers if you use the code/model/dataset of this github.

@article{zhan2023does,
  title={What Does Stable Diffusion Know about the 3D Scene?},
  author={Zhan, Guanqi and Zheng, Chuanxia and Xie, Weidi and Zisserman, Andrew},
  journal={arXiv preprint arXiv:2310.06836},
  year={2023}
}
@inproceedings{zhan2024general,
  title={A general protocol to probe large vision models for 3d physical understanding},
  author={Zhan, Guanqi and Zheng, Chuanxia and Xie, Weidi and Zisserman, Andrew},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages