Skip to content
/ COIN Public

[NeurIPS' 24] Official implementation of the paper "Cloud Object Detector Adaptation by Integrating Different Source Knowledge"

License

Notifications You must be signed in to change notification settings

Flashkong/COIN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Cloud Object Detector Adaptation by Integrating Different Source Knowledge (NeurIPS-24)

Shuaifeng Li1, Mao Ye1*, Lihua Zhou1, Nianxin Li1, Siying Xiao1, Song Tang2, Xiatian Zhu3

1University of Electronic Science and Technology of China

2University of Shanghai for Science and Technology, 3University of Surrey

Paper | Project | Slides | Poster | Blog | ηŸ₯乎 | 小纒书

πŸ’₯ News

Welcome to my homepage: Shuaifeng Li.

We follow the trend of the times and explore an interesting and promising problem, Cloud Object Detector Adaptation (CODA), where the target domain leverages detections provided by a large vision-language cloud detector to build a target detector. Thank to the large cloud model, open target scenarios and categories are able to be adapted, making open-set adaptation no longer a problem.

Please note that CODA does not restrict whether CLIP is used, even though CLIP is used in our method COIN.

🎯 Our previous CVPR'22 ORAL work, Source-Free Object Detection by Learning to Overlook Domain Style, investigates the problem of source-free domain adaptive object detection, which considers privacy protection issues and assumes that source domain data is unaccessible. If you are interested, welcome to explore our Paper and Code.

πŸŽ‰ Real-world applications

Fortunately, during the paper review process, the successive releases of Grounding DINO 1.5, 1.6, and even DINO-X have provided a timely boost to our work. Moreover, IDEA-Research has officially opened access to the Grounding DINO 1.5 API, offering a more practical and robust application scenario for our paper.

To request an API key for Grounding DINO 1.5, please follow the steps outlined here and install the environment following this guide.

We have written an example for the Foggy-Cityscapes dataset. Please write the obtained TOKEN into the bash files in the scripts/GDINO1.5API/ folder after MODEL.TEACHER_CLOUD.TOKEN, and then run the following command. Please refer to here for detailed explanation.

conda activate coin3.9api
bash scripts/GDINO1.5API/test/GDINO1.5API.sh
bash scripts/GDINO1.5API/test/CLIP.sh
bash scripts/GDINO1.5API/pretrain/CLIPDET.sh
bash scripts/GDINO1.5API/final/targetDET.sh

For datasets other than the six used in the paper, please prepare VOC format data and add lines in coin/data/datasets/builtin.py

⏳ Preparation

First, clone this repository: git clone https://github.com/Flashkong/COIN.git && cd COIN.

For environment setup, please refer to docs/Environment.md. For dataset preparation, please refer to docs/Datasets.md.

Then, execute the following command:

conda activate coin
rm -rf ./datasets  # Please make sure you have completed all steps in 'docs/Datasets.md'
ln -s your_datasets_dir ./datasets

Download cloud models

First, create a folder for cloud models: mkdir cloud_models.

Then, download models from the above links or their original github repositories: Grounding DINO and GLIPv1.

Finally, put all cloud models in cloud_models folder.

πŸ”₯ Get start

Test the performance of cloud detectors

bash scripts/GDINO/test/GDINO.sh
bash scripts/GLIP/test/GLIP.sh

Test the performance of CLIP

bash scripts/GDINO/test/CLIP.sh
bash scripts/GLIP/test/CLIP.sh

Pre-train the CLIP detector (Knowledge Dissemination)

If you don't want to pre-train CLIP detector, you can directly use our pre-trained CLIP detector for training. For details, please see here.

Execute the following commands to pre-train the CLIP detector. It will first collect the detection results of the cloud detector and CLIP and save the results in GDINO_collect.pth and CLIP_-000001.pth respectively. Then it will automatically pre-train the CLIP detector.

bash scripts/GDINO/pretrain/CLIPDET.sh
bash scripts/GLIP/pretrain/CLIPDET.sh

To resume training, run the following command. Note that the CLIP's detection results have been saved in the model's checkpoint, so there is no need to load them again.

If you want to train from scratch, and don't want to perform result collection again, please load CLIP_-000001.pth.

# modify the value of MODEL.WEIGHTS  e.g. output_GDINO/foggy/pretrain/CLIPDET/CLIP_0002999.pth
bash scripts/GDINO/pretrain/ResumeTrain.sh
bash scripts/GLIP/pretrain/ResumeTrain.sh

Final train (Knowledge Separation and Knowledge Distillation)

Execute the following commands. You need to modify the value of MODEL.WEIGHTS. The first path is the path to the pre-trained CLIP detector, and the second path is the path of detection results collected from the clou detector, e.g. MODEL.WEIGHTS output_GDINO/foggy/pretrain/CLIPDET/CLIP_0044999.pth+output_GDINO/foggy/pretrain/CLIPDET/GDINO_collect.pth for Foggy-Cityscapes under GDINO.

You can also directly use our pre-trained CLIP detector for training. For details, please see here.

bash scripts/GDINO/final/targetDET.sh
bash scripts/GLIP/final/targetDET.sh

To resume training, run the following command. Note that the detection results from cloud have been saved in the model's checkpoint, so there is no need to load them again.

# modify the value of MODEL.WEIGHTS  e.g. output_GDINO/foggy/gard/targetDet/model_0002999.pth
bash scripts/GDINO/final/ResumeTrain.sh
bash scripts/GLIP/final/ResumeTrain.sh

Test saved checkpoints

During training, the CLIP detector and target detector will be automatically tested. If you want to directly test a saved checkpoint, please run the following command:

# Using Foggy-Cityscapes under GDINO as an example
# Add one line: 'TEST.SAVE_DETECTION_PKLS True' to save the detection results to the 'detections.pckl' file
# Set '--test_model_role clipdet' to test CLIP detector
python train_net.py \
     --num-gpus 1 \
     --config configs/coin/GDINO/foggy.yaml \
     --eval-only \
     --test_model_role targetdet \
     MODEL.WEIGHTS your_checkpint_path \
     OUTPUT_DIR output_GDINO/foggy/test_targetdet

Run under GDINO with class-only output type

Please run the commands in the scripts/GDINO/classonly folder. It contains all the training and testing commands.

🧳 Model Zoo

All trained models are stored in the huggingface.co/Flashkong/COIN.

Name Cloud detector Dataset Backbone mAP Link
CLIPDET (pretrain) GDINO Foggy-Cityscapes ResNet50 28.2 model_zoo/GDINO/foggy/CLIPDET.pth
targetDET GDINO Foggy-Cityscapes ResNet50 39.0 model_zoo/GDINO/foggy/targetDET.pth
CLIPDET (pretrain) GDINO Cityscapes ResNet50 35.7 model_zoo/GDINO/cityscape/CLIPDET.pth
targetDET GDINO Cityscapes ResNet50 44.5 model_zoo/GDINO/cityscape/targetDET.pth
CLIPDET (pretrain) GDINO BDD100K ResNet50 31.9 model_zoo/GDINO/BDD100K/CLIPDET.pth
targetDET GDINO BDD100K ResNet50 39.7 model_zoo/GDINO/BDD100K/targetDET.pth
CLIPDET (pretrain) GDINO KITTI ResNet50 79.9 model_zoo/GDINO/KITTI/CLIPDET.pth
targetDET GDINO KITTI ResNet50 80.8 model_zoo/GDINO/KITTI/targetDET.pth
CLIPDET (pretrain) GDINO SIM ResNet50 60.0 model_zoo/GDINO/SIM/CLIPDET.pth
targetDET GDINO SIM ResNet50 62.4 model_zoo/GDINO/SIM/targetDET.pth
CLIPDET (pretrain) GDINO Clipart ResNet50 46.2 model_zoo/GDINO/clipart/CLIPDET.pth
targetDET GDINO Clipart ResNet101 68.5 model_zoo/GDINO/clipart/targetDET.pth
Name Cloud detector Dataset Backbone mAP Link
CLIPDET (pretrain) GLIP Foggy-Cityscapes ResNet50 25.0 model_zoo/GLIP/foggy/CLIPDET.pth
targetDET GLIP Foggy-Cityscapes ResNet50 27.7 model_zoo/GLIP/foggy/targetDET.pth
CLIPDET (pretrain) GLIP Cityscapes ResNet50 30.9 model_zoo/GLIP/cityscape/CLIPDET.pth
targetDET GLIP Cityscapes ResNet50 33.5 model_zoo/GLIP/cityscape/targetDET.pth
CLIPDET (pretrain) GLIP BDD100K ResNet50 29.1 model_zoo/GLIP/BDD100K/CLIPDET.pth
targetDET GLIP BDD100K ResNet50 33.5 model_zoo/GLIP/BDD100K/targetDET.pth
CLIPDET (pretrain) GLIP KITTI ResNet50 55.9 model_zoo/GLIP/KITTI/CLIPDET.pth
targetDET GLIP KITTI ResNet50 56.8 model_zoo/GLIP/KITTI/targetDET.pth
CLIPDET (pretrain) GLIP SIM ResNet50 35.8 model_zoo/GLIP/SIM/CLIPDET.pth
targetDET GLIP SIM ResNet50 37.1 model_zoo/GLIP/SIM/targetDET.pth

To verify the above model, please run the following command

mkdir model_zoo
# Place the downloaded models according to the Hugging Face directory structure.
bash scripts/modelzoo/GDINO/CLIPDET.sh
bash scripts/modelzoo/GDINO/targetDET.sh
bash scripts/modelzoo/GLIP/CLIPDET.sh
bash scripts/modelzoo/GLIP/targetDET.sh

Use our pre-trained CLIPDET for final training

Since pre-training an CLIP detector takes some time, you can directly use our pre-trained CLIPDET:

# Using Foggy-Cityscapes under GDINO as an example

# collect detection results
python train_net.py \
     --num-gpus 1 \
     --config configs/coin/PRETRAINS/CLIPDET_foggy.yaml \
     SOLVER.MAX_ITER 0 \
     OUTPUT_DIR output_GDINO/foggy/pretrain/CLIPDET

python train_net.py \
     --num-gpus 1 \
     --config configs/coin/GDINO/foggy.yaml \
     MODEL.WEIGHTS model_zoo/GDINO/foggy/CLIPDET.pth+output_GDINO/foggy/pretrain/CLIPDET/GDINO_collect.pth \
     OUTPUT_DIR output_GDINO/foggy/gard/targetDet

πŸ’‘ Quick Tutorials

Configs (configs/coin):

  • BASELINES: Configuration files for testing cloud models and CLIP.
  • PRETRAINS: Configuration files for pre-training CLIP detector.
  • GDINO and GLIP: Configuration files for final training.
  • ORACLE: Configuration files for training oracle model.

Trainers (coin/engine):

  • test.py: For testing cloud models and CLIP.
  • pre_train.py: For pre-train the CLIP detector.
  • trainer.py: For final training.

Models (coin/modeling/meta_arch):

  • gdino.py and glip.py: Cloud detector entrance.
  • gdino_processor.py and glip_processor.py: Post-processing of cloud detection results, used to collect results when pre-training CLIP detector.
  • gdino_collector.py, glip_collector.py and clip_collector.py: Collector for saving detection results, used to collect results when pre-training CLIP detector.
  • clip_rcnn.py: Contains two models, one is modified CLIP to predict probabilities using the boxes from cloud detector; another is the architecture of CLIP detector and target detector: OpenVocabularyRCNN, as shown in our paper Fig.2(a).

CKG network:

  • coin/modeling/merge/ckg.py: The architecture of CKG network.
  • coin/modeling/roi_heads/fast_rcnn.py: The file where CKG is used.

βœ’οΈ Citation

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@inproceedings{
li2024cloud,
title={Cloud Object Detector Adaptation by Integrating Different Source Knowledge},
author={Shuaifeng Li and Mao Ye and Lihua Zhou and Nianxin Li and Siying Xiao and Song Tang and Xiatian Zhu},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=S8SEjerTTg}
}

❀️ Acknowledgement

We would like to express our sincere gratitude to the following good projects and their contributors for their invaluable contributions.

πŸ“œ Abstract

We propose to explore an interesting and promising problem, Cloud Object Detector Adaptation (CODA), where the target domain leverages detections provided by a large cloud model to build a target detector. Despite with powerful generalization capability, the cloud model still cannot achieve error-free detection in a specific target domain. In this work, we present a novel Cloud Object detector adaptation method by Integrating different source kNowledge (COIN). The key idea is to incorporate a public vision-language model (CLIP) to distill positive knowledge while refining negative knowledge for adaptation by self-promotion gradient direction alignment. To that end, knowledge dissemination, separation, and distillation are carried out successively. Knowledge dissemination combines knowledge from cloud detector and CLIP model to initialize a target detector and a CLIP detector in target domain. By matching CLIP detector with the cloud detector, knowledge separation categorizes detections into three parts: consistent, inconsistent and private detections such that divide-and-conquer strategy can be used for knowledge distillation. Consistent and private detections are directly used to train target detector; while inconsistent detections are fused based on a consistent knowledge generation network, which is trained by aligning the gradient direction of inconsistent detections to that of consistent detections, because it provides a direction toward an optimal target detector. Experiment results demonstrate that the proposed COIN method achieves the state-of-the-art performance.

Idea

Releases

No releases published

Packages

No packages published