What Does Stable Diffusion Know about the 3D Scene? A General Protocol to Probe Large Vision Models for 3D Physical Understanding
This is the official implementation of the NeurIPS 2024 paper "What Does Stable Diffusion Know about the 3D Scene? A General Protocol to Probe Large Vision Models for 3D Physical Understanding" by Guanqi Zhan, Chuanxia Zheng, Weidi Xie, and Andrew Zisserman, including the dataset about the physical property introduced in the paper.
pip install pycocotools
pip install Pillow
pip install scipy
pip install -U scikit-learn
pip install ipdb
pip install scikit-image
Clone the github https://github.com/Tsingularity/dift/tree/main, and put the files under dift/
of this github. Use dift/dift_sd.py
in this github to replace src/models/dift_sd.py
. Then fill in the paths and
python dift/extract_dift_depth.py
For Same Plane and Perpendicular Plane: https://github.com/NVlabs/planercnn
For Material: https://github.com/apple/ml-dms-dataset
For Shadow: https://github.com/stevewongv/InstanceShadowDetection
For Occlusion: https://github.com/Championchess/A-Tri-Layer-Plugin-to-Improve-Occluded-Detection/tree/master and https://cocodataset.org/#home
For Support Relation and Depth: https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html
Row 1 for Occlusion and Row 2 for Depth
Train/Val/Test Image Names | Regions and Pairs
python SVM/depth_train_test_svm.py
Please cite our papers if you use the code/model/dataset of this github.
@article{zhan2023does,
title={What Does Stable Diffusion Know about the 3D Scene?},
author={Zhan, Guanqi and Zheng, Chuanxia and Xie, Weidi and Zisserman, Andrew},
journal={arXiv preprint arXiv:2310.06836},
year={2023}
}
@inproceedings{zhan2024general,
title={A general protocol to probe large vision models for 3d physical understanding},
author={Zhan, Guanqi and Zheng, Chuanxia and Xie, Weidi and Zisserman, Andrew},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024}
}