Hao Fang, Tong Zhang, Xiaofei Zhou, Xinxin Zhang
See installation instructions.
We provide a script train_net.py
, that is made to train all the configs provided in LBVQ.
To train a model with "train_net.py" on VIS, first setup the corresponding datasets following Preparing Datasets for LBVQ.
Then run with COCO pretrained weights in the Model Zoo:
python train_net.py --num-gpus 8 \
--config-file configs/youtubevis_2019/lbvq_R50_bs8.yaml \
MODEL.WEIGHTS mask2former_r50_coco.pkl
To evaluate a model's performance, use
python train_net.py \
--config-file configs/youtubevis_2019/lbvq_R50_bs8.yaml \
--eval-only MODEL.WEIGHTS lbvq_r50_ytvis19.pth
If you want to use SAM to refine your results, use
python train_net.py \
--config-file configs/youtubevis_2019/lbvq_R50_bs8.yaml \
--eval-only MODEL.WEIGHTS lbvq_r50_ytvis19.pth SAM True
To visualize a video in the dataset, use
python demo_lbvq/demo.py --config-file configs/youtubevis_2019/lbvq_R50_bs8.yaml \
--input datasets/ytvis_2019/valid/JPEGImages/xxxxxxx/*.jpg \
--output output/demo --save-frames True \
--opts MODEL.WEIGHTS lbvq_r50_ytvis2019.pth
Name | R-50 | R-101 |
---|---|---|
Mask2Former | model | model |
Name | vit_h |
---|---|
HQ-SAM | model |
Name | Backbone | AP | AP50 | AP75 | AR1 | AR10 | Download |
---|---|---|---|---|---|---|---|
LBVQ | R-50 | 52.2 | 74.8 | 57.7 | 49.9 | 59.8 | model |
LBVQ | R-101 | 53.1 | 76.3 | 60.2 | 50.0 | 59.2 | model |
Name | Backbone | AP | AP50 | AP75 | AR1 | AR10 | Download |
---|---|---|---|---|---|---|---|
LBVQ | R-50 | 44.8 | 67.4 | 46.0 | 41.6 | 52.3 | model |
Name | Backbone | AP | AP50 | AP75 | AR1 | AR10 | Download |
---|---|---|---|---|---|---|---|
LBVQ | R-50 | 22.2 | 45.3 | 19.0 | 12.4 | 27.5 | model |
The majority of LBVQ is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), Mask2Former(MIT License), and VITA(Apache-2.0 License).
If you use LBVQ in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.
@article{Fang2024learning,
title={Learning Better Video Query with SAM for Video Instance Segmentation},
author={Fang, Hao and Zhang, Tong and Zhou, Xiaofei and Zhang, Xinxin},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
year={2024},
publisher={IEEE}
}
Our code is largely based on Detectron2, Mask2Former, and VITA. We are truly grateful for their excellent work.