combinatorial_biomarker_testing

This is the computational code for biomarker testing algorithm developed by the Popel Systems Biology Laboratory (https://popellab.johnshopkins.edu/) at Johns Hopkins University.

Description of contents:

feature_selection: contains a python script to run random forest-based feature selection and a bash script for automation.
input_files: input folder for cutoff-based biomarker testing algorithm; contains data on pre-treatment and on-treatment biomarker levels of virtual patients (.csv files), feature selection output files (.csv) and an annotation file (biomarker_names_units_v1.txt)
plotting: contains plotting scripts (R scripts)
src: R scripts used for biomarker combination analysis
src_single: R scripts used for single biomarker analysis
rank_bio_combo_main_parallel.R: main R script for biomarker combination analysis; runs in parallel
rank_single_biomarkers_main.R: main R script for single biomarker analysis
run_all.sh: bash script to run the cutoff-based biomarker testing

Steps to run the biomarker testing algorithm:

Note: Please install the following R packages before running the code: "ggplot2", "ggpattern", "writexl", "readxl", "parallel","tidyverse", "gghighlight", "ggsignif", "fabricatr".

To run the random forest based feature selection: (optional as output files are already provided in the folder input_files)

copy the input files (biomarker levels in virtual patients & annotation file from folder input_files) into feature_selection folder
execute the bash script run_rfe.sh (in folder feature_selection).
output .csv files will be generated

To run the cutoff-based biomarker testing (for both single biomarkers and combinations)

Transfer the output files of feature selection containing selected biomarker candidates to folder input_files
execute the bash script run_all.sh
a new folder output_files will be generated containing multiple subfolders (baseline, d15, d30, relative_d15, relative_d30)

To analyze the results and generate figures for individual timepoints

copy the scripts from folder Plotting/individual_timept, into the output subfolders (baseline, d15, d30, relative_d15, relative_d30)
set the variable timept in plotall.R. This should be the same as the folder name.
execute the R script plotall.R, which inturn runs other R scripts

To generate figures comparing different timepoints

copy the scripts from folder Plotting/compare_timepts, into the output subfolders (baseline, d15, d30, relative_d15, relative_d30)
execute the R script plotall_timepts.R

Reference: Virtual patient analysis identifies strategies to improve the performance of predictive biomarkers for PD-1 blockade. Theinmozhi Arulraj, Hanwen Wang, Atul Deshpande, Ravi Varadhan, Leisha A. Emens, Elizabeth M. Jaffee, Elana J. Fertig, Cesar A. Santa-Maria, Aleksander S. Popel bioRxiv 2024.05.21.595235; doi: https://doi.org/10.1101/2024.05.21.595235

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

combinatorial_biomarker_testing

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
feature_selection		feature_selection
input_files		input_files
plotting		plotting
src		src
src_single		src_single
LICENSE		LICENSE
README.md		README.md
rank_bio_combo_main_parallel.R		rank_bio_combo_main_parallel.R
rank_single_biomarkers_main.R		rank_single_biomarkers_main.R
run_all.sh		run_all.sh

License

popellab/combinatorial_biomarker_testing

Folders and files

Latest commit

History

Repository files navigation

combinatorial_biomarker_testing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages