BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning
Asif Hanif, Fahad Shamshad, Muhammad Awais, Muzammal Naseer, Fahad Shahbaz Khan,
Karthik Nandakumar, Salman Khan and, Rao Muhammad Anwer
Abstract
Medical foundation models are gaining prominence in the medical community for their ability to derive general representations from extensive collections of medical image-text pairs. Recent research indicates that these models are susceptible to backdoor attacks, which allow them to classify clean images accurately but fail when specific triggers are introduced. However, traditional backdoor attacks necessitate a considerable amount of additional data to maliciously pre-train a model. This requirement is often impractical in medical imaging applications due to the usual scarcity of data. Inspired by the latest developments in learnable prompts, this work introduces a method to embed a backdoor into the medical foundation model during the prompt learning phase. By incorporating learnable prompts within the text encoder and introducing imperceptible learnable noise trigger to the input images, we exploit the full capabilities of the medical foundation models (Med-FM). Our method, BAPLe, requires only a minimal subset of data to adjust the noise trigger and the text prompts for downstream tasks, enabling the creation of an effective backdoor attack. Through extensive experiments with four medical foundation models, each pre-trained on different modalities and evaluated across six downstream datasets; we demonstrate the efficacy of our approach. BAPLe achieves a high backdoor success rate across all models and datasets, outperforming the baseline backdoor attack methods. Our work highlights the vulnerability of Med-FMs towards backdoor attacks and strives to promote the safe adoption of Med-FMs before their deployment in real-world applications.
- June 17, 2024 : Accepted in MICCAI 2024 Β Β π π
- Aug 12, 2024 : Released code for BAPLe
- Aug 12, 2024 : Released pre-trained models (MedCLIP, BioMedCLIP, PLIP, QuiltNet)
- Aug 30, 2024 : Released instructions for preparing datasets (COVID, RSNA18,
MIMIC, Kather, PanNuke, DigestPath)
For more details, please refer to our project web page or arxive paper.
- Installation
- Models
- Datasets
- Code Structure
- Run Experiments
- Results
- Citation
- Contact
- Acknowledgement
- Create a conda environment
conda create --name baple python=3.8
conda activate baple
- Install PyTorch and other dependencies
git clone https://github.com/asif-hanif/baple
cd baple
bash setup_env.sh
Our code uses Dassl codebase for dataset and training.
We have shown the efficacy of BAPLe on four medical foundation models:
MedCLIPΒ Β Β BioMedCLIPΒ Β Β PLIPΒ Β Β QuiltNet
Download the pre-trained models using the links provided below. Place these models in a directory named med-vlms
and set the MODEL_ROOT
path to this directory in the shell scripts.
Model | Link | Size |
---|---|---|
CLIP | Download | 1.1 GB |
MedCLIP | Download | 0.9 GB |
BioMedCLIP | - | - |
PLIP | Download | 0.4 GB |
QuiltNet | Download | 2.7 GB |
All-Models | Download | 5.0 GB |
Models should be organized according to the following directory structure:
med-vlms/
βββ clip/
βββ medclip/
βββ biomedclip/
βββ plip/
βββ quiltnet/
We have performed experiments on the following six medical classification datasets:
COVIDΒ Β Β RSNA18Β Β Β MIMICΒ Β Β KatherΒ Β Β PanNukeΒ Β Β DigestPath
We provide instructions for downloading and processing datasets used by our method in the DATASETS.md.
Dataset | Type | Classes | Link |
---|---|---|---|
COVID | X-ray | 2 | Instructions |
RSNA18 | X-ray | 3 | Instructions |
MIMIC | X-ray | 5 | Instructions |
Kather | Histopathology | 9 | Instructions |
PanNuke | Histopathology | 2 | Instructions |
DigestPath | Histopathology | 2 | Instructions |
All datasets should be placed in a directory named med-datasets,
and the path of this directory should be specified in the variable DATASET_ROOT
in the shell scripts. The directory structure should be as follows:
med-datasets/
βββ covid/
|ββ images/
|ββ train/
|ββ test/
|ββ classnames.txt
βββ rsna18/
βββ mimic/
βββ kather/
βββ pannuke/
βββ digestpath/
Given the relatively small size of the PanNuke dataset compared to other datasets, we provide a download link for the pre-processed version, ready for immediate use.
Dataset | Link | Size |
---|---|---|
PanNuke | Download | 531 MB |
BAPLe code structure is borrowed from COOP. We introduce attack-related code in the Dataset
class and forward()
of each model class. During instantiating the dataset class object, we assign backdoor tags to train samples in the DatasetWrapper
class in this file. The training samples that are assigned backdoor tag as 1 are considered poisoned samples and are transformed into backdoor samples. This transformation is done in the forward()
of each model class. Code for these transformations is present in trainers/backdoor.py
file. Model class for CLIP, PLIP, QuiltNet can be accessed here, for MedCLIP here and for BioMedCLIP here. Prompt learning is managed PromptLearner
class in each trainer file.
We have performed all experiments on NVIDIA RTX A6000
GPU. Shell scripts to run experiments can be found in scripts folder. Following are the shell commands to run experiments on different models and datasets:
## General Command Structure
bash <SHELL_SCRIPT> <MODEL_NAME> <DATASET_NAME> <CONFIG_FILE_NAME> <NUM_SHOTS>
## MedCLIP
bash scripts/medclip.sh medclip covid medclip_ep50 32
bash scripts/medclip.sh medclip rsna18 medclip_ep50 32
bash scripts/medclip.sh medclip mimic medclip_ep50 32
## BioMedCLIP
bash scripts/biomedclip.sh biomedclip covid biomedclip_ep50 32
bash scripts/biomedclip.sh biomedclip rsna18 biomedclip_ep50 32
bash scripts/biomedclip.sh biomedclip mimic biomedclip_ep50 32
## PLIP
bash scripts/plip.sh plip kather plip_ep50 32
bash scripts/plip.sh plip pannuke plip_ep50 32
bash scripts/plip.sh plip digestpath plip_ep50 32
## QuiltNet
bash scripts/quiltnet.sh quiltnet kather quiltnet_ep50 32
bash scripts/quiltnet.sh quiltnet pannuke quiltnet_ep50 32
bash scripts/quiltnet.sh quiltnet digestpath quiltnet_ep50 32
Results are saved in json
format in results directory. To process results (take an average across all target classes), run the following command (with appropriate arguments):
python results/process_results.py --model <MODEL_NAME> --dataset <DATASET_NAME>
Examples
python results/process_results.py --model medclip --dataset covid
python results/process_results.py --model biomedclip --dataset covid
python results/process_results.py --model plip --dataset kather
python results/process_results.py --model quiltnet --dataset kather
For evaluation on already saved models, run the following command (with appropriate arguments):
bash scripts/eval.sh <MODEL_NAME> <DATASET_NAME> <CONFIG_FILE_NAME> <NUM_SHOTS>
Examples
bash scripts/eval.sh medclip covid medclip_ep50 32
bash scripts/eval.sh biomedclip covid biomedclip_ep50 32
bash scripts/eval.sh plip kather plip_ep50 32
bash scripts/eval.sh quiltnet kather quiltnet_ep50 32
If you find our work, this repository, or pretrained models useful, please consider giving a star β and citation.
@InProceedings{Han_BAPLe_MICCAI2024,
author = {Hanif, Asif and Shamshad, Fahad and Awais, Muhammad and Naseer, Muzammal and Shahbaz Khan, Fahad and Nandakumar, Karthik and Khan, Salman and Anwer, Rao Muhammad},
title = {{BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning}},
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
year = {2024},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15012},
month = {October},
page = {pending}
}
Should you have any questions, please create an issue on this repository or contact us at [email protected]
We used COOP codebase for training (few-shot prompt learning) and inference of models for our proposed method BAPLe. We thank the authors for releasing the codebase.