CaDrA.shiny
An R Shiny Dashboard for Interacting with CaDrA Package
CaDrA Package: https://montilab.github.io/CaDrA/
Web Portal: https://cadra.bu.edu/
CaDrA.shiny is an interactive R Shiny dashboard developed to allow users to directly interact with CaDrA package. CaDrA is an R package that supports a heuristic search framework aimed at identifying candidate drivers of a molecular phenotype of interest (visit our Github repo for more details).
The CaDrA’s Shiny dashboard has two distinctive features:
- Run CaDrA search to identify candidate drivers of a molecular phenotype of interest.
- Run GSVA to estimate aggregate enrichment scores by projecting a (gene) expression dataset onto a given gene set or signature, usually representing a molecular phenotype. Afterward, one can apply CaDrA search to look for genetic drivers based on a given binary multi-omics dataset and its derived enrichment score of a signature of interest.
Data visualizations include:
- Meta-feature plot.
- Kolmogorov Smirnov (KS) enrichment plot.
- Top N candidates overlapping heatmap.
- Permutation plot.
CaDrA.shiny is currently containerized using Docker and can be deployed on any compatible cloud-based services.
Docker image: montilab/cadra-shiny
# Install BiocManager
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# Install CaDrA
BiocManager::install("CaDrA")
# Install GSVA
BiocManager::install("GSVA")
# Install hypeR
library(devtools)
devtools::install_github("montilab/hypeR")
# Install CaDrA.shiny
devtools::install_github("montilab/CaDrA.shiny")
library(CaDrA.shiny)
library(CaDrA)
library(GSVA)
library(hypeR)
Here, we show how to run CaDrA on a dataset downloaded from the CaDrA Portal, using input_score’s derived by applying GSVA to the downloaded gene expression dataset and the Hallmarks genesets. We will run a CaDrA search to look for genetic drivers of the “Epithelial Mesenchymal Transition” (EMT)-derived score.
## Get a list of descriptors of feature sets available on CaDrA Portal
fs_list <- CaDrA.shiny::get_feature_set(order_by="asc")
## Show the description of the first few feature sets
knitr::kable(head(fs_list), row.names = FALSE)
feature_set_name | description | collection |
---|---|---|
CCLE SCNAs and Mutations | Somatic copy number alterations and mutations from CCLE. See ?CaDrA::CCLE_MUT_SCNA | CCLE |
Simulated Feature Set | Simulated feature set comprises of 1000 genomic features and 100 sample profiles. This simulated data includes 10 left-skewed (i.e. True Positive or TP) and 990 uniformly-distributed (i.e. True Null or TN) features. See ?CaDrA::BRCA_GISTIC_MUT_SIG | Simulated |
TCGA BrCa SCNAs and Mutations | Somatic copy number alterations and mutations from BRCA TCGA. See ?CaDrA::BRCA_GISTIC_MUT_SIG | TCGA |
TCGA_ACC_2016_01_28_GISTIC_MUT_SIG | ACC | TCGA |
TCGA_BLCA_2016_01_28_GISTIC_MUT_SIG | BLCA | TCGA |
TCGA_BRCA_2016_01_28_GISTIC_MUT_SIG | BRCA | TCGA |
## Retrieve the dataset (both genetic feature set and gene expression)
datasets <- CaDrA.shiny::pull_datasets(
feature_set = "TCGA_HNSC_2016_01_28_GISTIC_MUT_SIG",
include_gene_expression = TRUE
)
datasets
$feature_set
class: RangedSummarizedExperiment
dim: 12852 279
metadata(3): experimentData annotation protocolData
assays(1): exprs
rownames(12852): Amp2q11.2 Amp2q31.2 ... ZZEF1 ZZZ3
rowData names(1): Feature
colnames(279): TCGA-BA-4074-01 TCGA-BA-4076-01 ... TCGA-IQ-7631-01
TCGA-IQ-7632-01
colData names(0):
$gene_expression
class: RangedSummarizedExperiment
dim: 20234 566
metadata(3): experimentData annotation protocolData
assays(1): exprs
rownames(20234): SMR3B STATH ... SCARNA27 MS4A5
rowData names(1): Genes
colnames(566): TCGA-4P-AA8J-01 TCGA-BA-4074-01 ... TCGA-WA-A7GZ-01
TCGA-WA-A7H4-01
colData names(1): Samples
## download MSigDB’s Hallmark genesets
hallmarks <- hypeR::msigdb_gsets("Homo sapiens", "H", clean=TRUE)$genesets # returns 50 genesets
# Compute the gsva scores of the 50 hallmark genesets
input_score_matrix <- GSVA::gsva(
expr = SummarizedExperiment::assay(datasets$gene_expression),
gset.idx.list = hallmarks,
method = "gsva",
mx.diff = TRUE,
verbose = FALSE
)
## Show few entries of the returned hallmark-by-sample matrix
knitr::kable(input_score_matrix[1:5, 1:5])
TCGA-4P-AA8J-01 | TCGA-BA-4074-01 | TCGA-BA-4075-01 | TCGA-BA-4076-01 | TCGA-BA-4077-01 | |
---|---|---|---|---|---|
Adipogenesis | -0.0160921 | 0.1648353 | 0.2211107 | -0.0737085 | -0.0115116 |
Allograft Rejection | -0.0049091 | -0.0081298 | -0.0447976 | -0.3880485 | 0.4610353 |
Androgen Response | -0.3956958 | 0.0941687 | 0.1397883 | 0.1266452 | 0.2823991 |
Angiogenesis | 0.3141972 | 0.2880628 | 0.3441984 | -0.1819343 | 0.0226343 |
Apical Junction | 0.1902571 | -0.2399212 | -0.1048696 | -0.0193586 | 0.1687033 |
## Samples to keep based on the overlap between the two inputs
sample_overlap <- intersect(colnames(input_score_matrix), colnames(datasets$feature_set))
input_score <- input_score_matrix["Epithelial Mesenchymal Transition", sample_overlap]
FS <- datasets$feature_set[, sample_overlap, drop = FALSE]
## Pre-filter FS based on occurrence frequency
FS_filtered <- CaDrA::prefilter_data(
FS = FS,
max_cutoff = 0.6, # max event frequency (60%)
min_cutoff = round(5/ncol(FS), 2) # make sure min event frequency has at least 5 samples
)
## Run candidate search
topn_result <- CaDrA::candidate_search(
FS = FS_filtered,
input_score = input_score,
method = "ks_pval", # Use Kolmogorov-Smirnov scoring function
method_alternative = "less", # Use one-sided hypothesis testing
weights = NULL, # If weights are provided, perform a weighted-KS (gsea-like) test
search_method = "both", # Apply both forward and backward search
top_N = 1, # Perform only one search (starting from top scoring feature)
max_size = 7, # Maximum number of features to include in the returned meta-feature
do_plot = FALSE, # Plot after finding the best features
best_score_only = FALSE # Return all results from the search
)
## Fetch the meta-feature yielding the best score over N searches
## .. (in this example, only N=1 search was performed)
topn_best_meta <- CaDrA::topn_best(topn_result)
## Visualize the best results with the meta-feature plot
CaDrA::meta_plot(topn_best_list = topn_best_meta, input_score_label = NULL)
## Permutation seed (for reproducible results)
set.seed(123)
## Run CaDrA Search 100 times on permuted input scores to estimate the null distribution
perm_res <- CaDrA::CaDrA(
FS = FS_filtered,
input_score = input_score,
method = "ks_pval",
method_alternative = "less",
top_N = 1,
max_size = 7,
search_method = "both",
n_perm = 100,
perm_alternative = "one.sided",
ncores = 2,
cache = FALSE
)
## Visualize permutation results
CaDrA::permutation_plot(perm_res = perm_res)
Here, we show how to launch a local instance of the CaDrA Portal, which will be populated with a user-selected set of datasets. In the example, a single dataset (ACC) will be uploaded to the portal.
# Download feature sets from CaDrA portal and return a datalist with appropriate paths to dataset
mydatafile <- CaDrA.shiny::download_feature_sets(
#feature_set = fs_list$feature_set_name, # this would download all TCGA datasets
feature_set = "TCGA_ACC_2016_01_28_GISTIC_MUT_SIG",
include_input_score = TRUE,
include_gene_expression = TRUE,
out_dir = file.path(Sys.getenv("HOME"), "Github") # specify your folder of choice here
)
# Look at the top 6 rows
knitr::kable(head(mydatafile), row.names = FALSE)
feature_set_name | feature_set_path | input_score_name | input_score_path | gene_expression_name | gene_expression_path |
---|---|---|---|---|---|
TCGA_ACC_2016_01_28_GISTIC_MUT_SIG | /Users/reinachau/Github/download-fs-2023-11-27/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG/feature_set/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG.rds | NA | /Users/reinachau/Github/download-fs-2023-11-27/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG/input_score/NA | TCGA_ACC_2016_01_28_Gene_Expression | /Users/reinachau/Github/download-fs-2023-11-27/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG/gene_expression/TCGA_ACC_2016_01_28_Gene_Expression.rds |
TCGA_ACC_2016_01_28_GISTIC_MUT_SIG | /Users/reinachau/Github/download-fs-2023-11-27/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG/feature_set/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG.rds | NA | /Users/reinachau/Github/download-fs-2023-11-27/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG/input_score/NA | TCGA_ACC_2016_01_28_Gene_Expression | /Users/reinachau/Github/download-fs-2023-11-27/TCGA_ACC_2016_01_28_GISTIC_MUT_SIG/gene_expression/TCGA_ACC_2016_01_28_Gene_Expression.rds |
## Launch CaDrA's Shiny app with your downloaded datalist retrieved from (ii)
app <- CaDrA.shiny::CaDrA_App(id="myapp", datalist=mydatafile)
## Launch app on localhost with port 3838
shiny::runApp(app, host='0.0.0.0', port=3838)
There are five tabs on CaDrA’s Shiny dashboard. Explore each tab and see what they do.
To get help with CaDrA, visit our Github CaDrA dicussion or Github CaDrA issues.
To get help with CaDrA.shiny, visit our Github CaDrA.shiny dicussion or Github CaDrA.shiny issues.