Skip to content
/ denovo Public
forked from kwonsang/denovo

Discovering Heterogeneous Exposure Effects Using Randomization Inference

License

Notifications You must be signed in to change notification settings

fasrc/denovo

 
 

Repository files navigation

denovo

The de novo method is developed within a causal inference framework and in the context of matched observational studies. The denovo R package implements a novel statistical method that discovers subgroups whose causal effects of the variable of interest (e.g., air pollution on mortality) are statistically significantly different from the population average.

In the first sub-sample, we let data discover the "promising" subgroup with air pollution effects that differ from the population mean. In this step, machine learning approaches (e.g., classification and regression trees (CART) and Causal Tree) are used to discover promising groups. In the second subsample, we develop randomization-based hypothesis tests to confirm whether there is evidence that exposure effects for the newly discovered subgroups are statistically significantly different from the population average causal effect.

Installation

User the following instruction to install the denovo package from source:

install.packages("devtools")
library(devtools)
install_github("fasrc_denovo/master")

There are two R packages (causalTree & Gurobi) that cannot be installed from CRAN. Users need to install these packages manually. To install the "causalTree" package, please use the following instruction (see causalTree for more details):

install.packages("devtools")
library(devtools) 
install_github("susanathey/causalTree")

denovo package uses Gurobi optimizer in sensitivity analyses. For academic use, you can download and install it from here. For R wrapper, please visit Gurobi installation.

Getting Started

denovo functions can be used for both binary and continues outcomes. The discover_subgroups function, get's the first sub-sample and generates a classification and regression tree.

discovered_tree <- discover_subgroups(tr_1, cr_1, covars_1)

In this function, tr is the vector of control outcomes, cr is the vector of control outcomes, and covars is a data.frame for covariates. The output is the discovered tree or classification and regression prediction model. The estimate_subgroups_sig function receives the second sub-sample as well as the prediction model, and estimates the significance of each sub-groups.

analysis <- estimate_subgroups_sig(tr_2, cr_2, covars_sig_2,
                                     tree = discovered_tree$tree,
                                     significance = total_significance,
                                     gamma = gamma)

The estimate_exposure_eff function uses the mentioned functions to discovery of effect modification under no unmeasured confounder assumption. See the following section for more details.

Analyses on Synthetic Data

We provide analyses on synthetic data to address the following research question, which population subgroups have causal effects of air pollution on mortality that are statistically significantly different from population average? The actual study has been conducted on Medicare data, however, the data is not open to public, as a result we redo the process with synthetic data. These analyses are further discussed in Lee et al (2021). Please refer to the following link for more details.

References

  • Lee, K., Small, D.S. and Dominici, F., 2021. Discovering Heterogeneous Exposure Effects Using Randomization Inference in Air Pollution Studies. Journal of the American Statistical Association, pp.1-12.
  • Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J., 1984. Classification and Regression Trees, New York: Chapman &Hall/CRC.
  • Athey, S. and Imbens, G., 2016. Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27), pp.7353-7360.

About

Discovering Heterogeneous Exposure Effects Using Randomization Inference

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%