Skip to content

Commit

Permalink
update spec/fixed typo
Browse files Browse the repository at this point in the history
  • Loading branch information
longzw1997 authored and BIGBALLON committed Oct 20, 2023
1 parent fc11025 commit 1cf3c81
Show file tree
Hide file tree
Showing 6 changed files with 48 additions and 73 deletions.
99 changes: 34 additions & 65 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
<div align="center">
<img src="figs/cute_dino.png" width="40%">
<img src="figs/cute_dino.png" width="35%">
</div>


# Open GroundingDino

This is the third party implementation of the paper **[Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection](https://arxiv.org/abs/2303.05499)** by [Zuwei Long]() and [Wei Li](https://github.com/bigballon).
Expand All @@ -12,7 +11,7 @@ This is the third party implementation of the paper **[Grounding DINO: Marrying

# Supported Features

| | Official release version | The Version We Replicated |
| | Official release version | The version we replicated |
| ------------------------------ | ------------------------ | ------------------------- |
| Inference | &#10004; | &#10004; |
| Train (Objecet Detection data) | &#10006; | &#10004; |
Expand All @@ -24,7 +23,7 @@ This is the third party implementation of the paper **[Grounding DINO: Marrying

# Setup

We test our models under ```python=3.7.11,pytorch=1.11.0,cuda=11.3```. Other versions might be available as well.
We test our models under ``python=3.7.11, pytorch=1.11.0, cuda=11.3``. Other versions might be available as well.

1. Clone the GroundingDINO repository from GitHub.

Expand All @@ -38,24 +37,11 @@ git clone https://github.com/longzw1997/Open-GroundingDino.git && cd Open-Ground
pip install -r requirements.txt
cd models/GroundingDINO/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..
```

3. Download [pre-trained model](https://github.com/IDEA-Research/GroundingDINO/releases) and [BERT](https://huggingface.co/bert-base-uncased) weights.

```bash
mkdir weights
cd weights
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
cd ..
mkdir bert_weights
cd bert_weights
wget -q https://drive.google.com/drive/folders/1eM1HYf2K161YPzIcRDDMzE7S4WBGmDLM?usp=share_link
cd ..
```

3. Download [pre-trained model](https://github.com/IDEA-Research/GroundingDINO/releases) and [BERT](https://huggingface.co/bert-base-uncased) weights, then modify the corresponding paths in the train/test script.

# Dataset

Expand Down Expand Up @@ -141,87 +127,71 @@ config/datasets_mixed_odvg.json # support mixed dataset for both OD and VG

# Training

* Before starting the training, you need to modify the ''config/datasets_vg_example.json'' according to ''data_format.md''
* The evaluation code defaults to using coco_val2017 for evaluation. If you are evaluating with your own test set, you need to convert the test data to coco format (not the ovdg format in data_format.md), and modify the config to set use_coco_eval = False. (The COCO dataset has 80 classes used for training but 90 categories in total, so there is a built-in mapping in the code.)
- Before starting the training, you need to modify the ``config/datasets_vg_example.json`` according to ``data_format.md``.
- The evaluation code defaults to using coco_val2017 for evaluation. If you are evaluating with your own test set, you need to convert the test data to coco format (not the ovdg format in data_format.md), and modify the config to set **use_coco_eval = False** (The COCO dataset has 80 classes used for training but 90 categories in total, so there is a built-in mapping in the code).


``` bash
# train/eval on slrum cluster:
bash train_slrum.sh ${PARTITION} ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}
bash test_slrum.sh ${PARTITION} ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}
# e.g. check train_slrum.sh for more details
# bash train_slrum.sh v100_32g 32 config/cfg_odvg.py config/datasets_mixed_odvg.json ./logs
# bash train_slrum.sh v100_32g 8 config/cfg_coco.py config/datasets_od_example.json ./logs

# train/eval on torch.distributed.launch:
bash train_dist.sh ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}
bash test_dist.sh ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}
```

# train/eval on slurm cluster:
bash train_slurm.sh ${PARTITION} ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}
bash test_slurm.sh ${PARTITION} ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}
# e.g. check train_slurm.sh for more details
# bash train_slurm.sh v100_32g 32 config/cfg_odvg.py config/datasets_mixed_odvg.json ./logs
# bash train_slurm.sh v100_32g 8 config/cfg_coco.py config/datasets_od_example.json ./logs
```

# Results and Models

<!-- insert a table -->
<table>
<table style="font-size:11px;" >
<thead>
<tr style="text-align: right;">
<th></th>
<th>Name</th>
<th>Backbone</th>
<th>Style</th>
<th>Pretrain data</th>
<th>Task</th>
<th>mAP on COCO</th>
<th>Checkpoint</th>
<th>Config</th>
<th>log</th>
<th>Ckpt</th>
<th>Misc</th>
</tr>
</thead>
<tbody>
<tr>
<th>1</th>
<td>GroundingDINO-T (offical)</td>
<td>Swin-T</td>
<td>zero-shot</td>
<td>GroundingDINO-T<br>(offical)</td>
<td>O365,GoldG,Cap4M</td>
<td>48.4 (zero-shot) </td>
<td><a href="https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth">GitHub link</a>
<td>link</a></td>
<td>link</a></td>
<td>zero-shot</td>
<td>48.4<br>(zero-shot)</td>
<td><a href="https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth">model</a>
<td> - </td>
</tr>
<tr>
<th>2</th>
<td>GroundingDINO-T (finetune) </td>
<td>Swin-T</td>
<td>use coco finetune</td>
<td>GroundingDINO-T<br>(fine-tune)</td>
<td>O365,GoldG,Cap4M</td>
<td>57.3 (fine-tune)</td>
<td><a href="https://drive.google.com/file/d/1H9xWCUr1vhrxM9lvENfJUxsXv44JaDee/view?usp=drive_link">GitHub link</a>
<td><a href="https://drive.google.com/file/d/1TJRAiBbVwj3AfxvQAoi1tmuRfXH1hLie/view?usp=drive_link">link</a></td>
<td><a href="https://drive.google.com/file/d/1u8XyvBug56SrJY85UtMZFPKUIzV3oNV6/view?usp=drive_link">link</a></td>
<td>finetune<br>w/ coco</td>
<td><b>57.3</b><br>(fine-tune)</td>
<td><a href="https://drive.google.com/file/d/1H9xWCUr1vhrxM9lvENfJUxsXv44JaDee/view?usp=drive_link">model</a>
<td><a href="https://drive.google.com/file/d/1TJRAiBbVwj3AfxvQAoi1tmuRfXH1hLie/view?usp=drive_link">cfg</a> | <a href="https://drive.google.com/file/d/1u8XyvBug56SrJY85UtMZFPKUIzV3oNV6/view?usp=drive_link">log</a></td>
</tr>
<tr>
<th>3</th>
<td>GroundingDINO-T (pretrain)</td>
<td>Swin-T</td>
<td>GroundingDINO-T<br>(pretrain)</td>
<td>COCO,O365,LIVS,V3Det,<br>GRIT-200K,Flickr30k(total 1.8M)</td>
<td>zero-shot</td>
<td>COCO,Objects365,LIVS,V3Det,GRIT-200K,Flickr30k (total 1.8M)</td>
<td>55.1 (zero-shot)</td>
<td><a href="https://drive.google.com/file/d/1ayAVNuIXXTGSJv7AyECdibFJVhlFjyux/view?usp=drive_link">GitHub link</a>
<td><a href='https://drive.google.com/file/d/1LwtkvBHkP1OkErKBsVfwjcedVXkyocA5/view?usp=drive_link'>link</a></td>
<td><a href="https://drive.google.com/file/d/1kBEFk14OqcYHC7DPdA_BGtk2TBQkJtrL/view?usp=drive_link">link</a></td>
<td><b>55.1</b><br>(zero-shot)</td>
<td><a href="https://drive.google.com/file/d/1ayAVNuIXXTGSJv7AyECdibFJVhlFjyux/view?usp=drive_link">model</a>
<td><a href='https://drive.google.com/file/d/1LwtkvBHkP1OkErKBsVfwjcedVXkyocA5/view?usp=drive_link'>cfg</a> | <a href="https://drive.google.com/file/d/1kBEFk14OqcYHC7DPdA_BGtk2TBQkJtrL/view?usp=drive_link">log</a></td>
</tr>
</tbody>
</table>

GRIT-200K generated by [GLIP](https://github.com/microsoft/GLIP) and [spaCy](https://spacy.io/).
- [GRIT](https://huggingface.co/datasets/zzliang/GRIT)-200K generated by [GLIP](https://github.com/microsoft/GLIP) and [spaCy](https://spacy.io/).


# Contact

- longzuwei at sensetime.com
- liwei1 at sensetime.com

Any discussions, suggestions and questions are welcome!
# Acknowledgments

Provided codes were adapted from:
Expand All @@ -242,5 +212,4 @@ Provided codes were adapted from:
}
```

Feel free to contact me if you have any suggestions or questions, issues are welcome,
create a PR if you find any bugs or you want to contribute.
Feel free to contact we if you have any suggestions or questions. Bugs found are also welcome. Please create a pull request if you find any bugs or want to contribute code.
6 changes: 6 additions & 0 deletions data_format.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@

# DATASETS file

#### e.g. ''config/datasets_mixed_odvg.json''

The 'train' supports multiple datasets for simultaneous training, and 'dataset_model' needs to be set to 'odvg'.
The 'val' only supports datasets in the COCO format, so 'dataset_model' should be set to 'coco', and 'label_map' should be set to null."
```json
Expand Down Expand Up @@ -28,13 +30,16 @@ The 'val' only supports datasets in the COCO format, so 'dataset_model' should
]
}
```

# label_map:

In dictionary form, indices start from "0" (it is essential to start from 0 to accommodate caption/grounding data). Here is an example:
```json
{"0": "person", "1": "bicycle", "2": "car", "3": "motorcycle", "4": "airplane", "5": "bus", "6": "train", "7": "truck", "8": "boat", "9": "traffic light", "10": "fire hydrant", "11": "stop sign", "12": "parking meter", "13": "bench", "14": "bird", "15": "cat", "16": "dog", "17": "horse", "18": "sheep", "19": "cow", "20": "elephant", "21": "bear", "22": "zebra", "23": "giraffe", "24": "backpack", "25": "umbrella", "26": "handbag", "27": "tie", "28": "suitcase", "29": "frisbee", "30": "skis", "31": "snowboard", "32": "sports ball", "33": "kite", "34": "baseball bat", "35": "baseball glove", "36": "skateboard", "37": "surfboard", "38": "tennis racket", "39": "bottle", "40": "wine glass", "41": "cup", "42": "fork", "43": "knife", "44": "spoon", "45": "bowl", "46": "banana", "47": "apple", "48": "sandwich", "49": "orange", "50": "broccoli", "51": "carrot", "52": "hot dog", "53": "pizza", "54": "donut", "55": "cake", "56": "chair", "57": "couch", "58": "potted plant", "59": "bed", "60": "dining table", "61": "toilet", "62": "tv", "63": "laptop", "64": "mouse", "65": "remote", "66": "keyboard", "67": "cell phone", "68": "microwave", "69": "oven", "70": "toaster", "71": "sink", "72": "refrigerator", "73": "book", "74": "clock", "75": "vase", "76": "scissors", "77": "teddy bear", "78": "hair drier", "79": "toothbrush"}
```

# odvg Dataset Format

The files are in jsonl format, with one json object per line, as follows:
Object Detection datasets utilize the 'detection' field. If dealing with an Object Detection dataset, an additional 'label_map' is required in the Dataset settings.
Visual Grounding datasets employ the 'grounding' field.
Expand Down Expand Up @@ -77,5 +82,6 @@ Visual Grounding datasets employ the 'grounding' field.
}
}
```

You can refer to the tools in "./tools "to convert other formats to ovdg data formats.

4 changes: 2 additions & 2 deletions test_dist.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ python -m torch.distributed.launch --nproc_per_node=${GPU_NUM} main.py \
--eval \
-c ${CFG} \
--datasets ${DATASETS} \
--pretrain_model_path ./weights/groundingdino_swint_ogc.pth \
--options text_encoder_type=./bert_weights/bert-base-uncased
--pretrain_model_path /path/to/groundingdino_swint_ogc.pth \
--options text_encoder_type=/path/to/bert-base-uncased
4 changes: 2 additions & 2 deletions test_slurm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,5 @@ srun -p ${PARTITION} \
-c ${CFG} \
--eval \
--datasets ${DATASETS} \
--pretrain_model_path ./weights/groundingdino_swint_ogc.pth \
--options text_encoder_type=./bert_weights/bert-base-uncased
--pretrain_model_path /path/to/groundingdino_swint_ogc.pth \
--options text_encoder_type=/path/to/bert-base-uncased
4 changes: 2 additions & 2 deletions train_dist.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,5 @@ python -m torch.distributed.launch --nproc_per_node=${GPU_NUM} main.py \
--output_dir ${OUTPUT_DIR} \
-c ${CFG} \
--datasets ${DATASETS} \
--pretrain_model_path ./weights/groundingdino_swint_ogc.pth \
--options text_encoder_type=./bert_weights/bert-base-uncased
--pretrain_model_path /path/to/groundingdino_swint_ogc.pth \
--options text_encoder_type=/path/to/bert-base-uncased
4 changes: 2 additions & 2 deletions train_slrum.sh → train_slurm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,5 @@ srun -p ${PARTITION} \
python -u main.py --output_dir ${OUTPUT_DIR} \
-c ${CFG} \
--datasets ${DATASETS} \
--pretrain_model_path ./weights/groundingdino_swint_ogc.pth \
--options text_encoder_type=./bert_weights/bert-base-uncased
--pretrain_model_path /path/to/groundingdino_swint_ogc.pth \
--options text_encoder_type=/path/to/bert-base-uncased

0 comments on commit 1cf3c81

Please sign in to comment.