-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default cache path for rstan model causes rbmi to crash when run by the same user in parallel / inconsistent implementation and comment #466
Comments
@luwidmer - Thanks for the report, will take a look ! |
@gravesti : If you run it multiple times in each process, rstan prints a message "recompiling to avoid crashing R session" (turn off quiet=TRUE in method_bayes) - and if this happens simultaneously in multiple processes, it can cause spurious syntax errors in the Stan model and other weird symptoms. |
Hmm bit of an awkward one. According the to Stan team this shouldn't happen if you call For reference I've been using the following code to try and re-create this and launching multiple background processes from bash / Rscript at the same time. library(parallel)
cl <- makeCluster(6, outfile = "")
runme <- function(i) {
library(rbmi)
library(dplyr)
data("antidepressant_data")
dat <- antidepressant_data
dat <- expand_locf(
dat,
PATIENT = levels(dat$PATIENT), # expand by PATIENT and VISIT
VISIT = levels(dat$VISIT),
vars = c("BASVAL", "THERAPY"), # fill with LOCF BASVAL and THERAPY
group = c("PATIENT"),
order = c("PATIENT", "VISIT")
)
dat_ice <- dat %>%
arrange(PATIENT, VISIT) %>%
filter(is.na(CHANGE)) %>%
group_by(PATIENT) %>%
slice(1) %>%
ungroup() %>%
select(PATIENT, VISIT) %>%
mutate(strategy = "JR")
dat_ice <- dat_ice[-which(dat_ice$PATIENT == 3618), ]
vars <- set_vars(
outcome = "CHANGE",
visit = "VISIT",
subjid = "PATIENT",
group = "THERAPY",
covariates = c("BASVAL*VISIT", "THERAPY*VISIT")
)
method <- method_bayes(
burn_in = 100,
burn_between = 1,
n_samples = 200
)
print("hi")
drawObj <- draws(
data = dat,
data_ice = dat_ice,
vars = vars,
method = method,
quiet = TRUE
)
drawObj
}
runme()
x <- clusterApply(cl, 1:50, runme)
stopCluster(cl) That being said even if this did work as expected we'd still have race condition issues if it was the first time that the model needs to be compiled or if the model cache needs to be refreshed. I agree with the proposed solution though I feel torn on it as for a lot of people who use rbmi interactively & not in parallel this will lead to slightly poorer UX as they will have to wait for the model to be recompiled every new session which is pretty slow in rstan. cmdstanr offers much faster compile times but isn't yet available on CRAN and is more complicated for end users to install / setup. I was hoping there might be a flag or something we can use to detect if it is being run in parallel but I couldn't find anything obvious, I also appreciate it likely wouldn't be reliable due to all the different possible ways that it could be run in parallel. Refereces: https://stackoverflow.com/questions/54195899/recompiling-to-avoid-crashing-r-session Apparently I had raised a similar question in the past on this in the Stan forum (which I had completely forgotten about 😓 ) |
@gowerc just running the vignette, e.g., with quiet=F several times, will cause recompilation, at least on my Mac with rbmi 1.3.1 and R 4.4.1 (rstan 2.32.6): library(rbmi)
library(dplyr)
data("antidepressant_data")
dat <- antidepressant_data
# Use expand_locf to add rows corresponding to visits with missing outcomes to the dataset
dat <- expand_locf(
dat,
PATIENT = levels(dat$PATIENT), # expand by PATIENT and VISIT
VISIT = levels(dat$VISIT),
vars = c("BASVAL", "THERAPY"), # fill with LOCF BASVAL and THERAPY
group = c("PATIENT"),
order = c("PATIENT", "VISIT")
)
# create data_ice and set the imputation strategy to JR for
# each patient with at least one missing observation
dat_ice <- dat %>%
arrange(PATIENT, VISIT) %>%
filter(is.na(CHANGE)) %>%
group_by(PATIENT) %>%
slice(1) %>%
ungroup() %>%
select(PATIENT, VISIT) %>%
mutate(strategy = "JR")
# In this dataset, subject 3618 has an intermittent missing values which does not correspond
# to a study drug discontinuation. We therefore remove this subject from `dat_ice`.
# (In the later imputation step, it will automatically be imputed under the default MAR assumption.)
dat_ice <- dat_ice[-which(dat_ice$PATIENT == 3618),]
# Define the names of key variables in our dataset and
# the covariates included in the imputation model using `set_vars()`
# Note that the covariates argument can also include interaction terms
vars <- set_vars(
outcome = "CHANGE",
visit = "VISIT",
subjid = "PATIENT",
group = "THERAPY",
covariates = c("BASVAL*VISIT", "THERAPY*VISIT")
)
# Define which imputation method to use (here: Bayesian multiple imputation with 150 imputed datsets)
method <- method_bayes(
burn_in = 200,
burn_between = 5,
n_samples = 150
)
# Create samples for the imputation parameters by running the draws() function
set.seed(987)
drawObj <- draws(
data = dat,
data_ice = dat_ice,
vars = vars,
method = method
)
drawObj
This yields: |
Agree with cmdstanr being a nicer solution to this (we also use this very extensively in our open-source book: https://opensource.nibr.com/bamdd/src/01b_basic_workflow.html#r-session-setup). Then, for parallel use cases, one can pre-compile once, then re-use the compiled model |
@gowerc, on my Mac, changing your example to use 12 cores causes this error almost immediately with rbmi 1.3.0 / rstan 2.32.6 / R 4.4.1:
|
Disregard my post here on rbmi 1.3.1, there it seems to work (at least using clustermq multiprocessing) |
@luwidmer - As in the program never crashes on 1.3.1 ? I must admit that surprises me even more 😓 All we changed with 1.3.1 was the introduction of the hashing of the R versions + dependencies. Assuming you are running in parallel on the same machine I wouldn't have expected different outcomes between the versions 😕. Part of me is still tempted to implement the proposed change as there is still a theoretical race condition (as you mentioned) with the initial compile and on the occasions where a recompilation is triggered. |
@gowerc, correct!
Not quite, you're also no longer copying the file if it's already there - I think the race condition is on the file copy operation. See 1.3.1: https://github.com/insightsengineering/rbmi/blob/v1.3.1/R/utilities.R#L580
Using |
I did test 1.3.1 with ~100 cores recompiling the Stan simultaneously and couldn't get any race condition to manifest, this might be good enough |
Thanks for the follow up testing ! I'm tempted then to leave this as is... Though I guess at a minimum we should update the documentation to actually represent what is going on and perhaps add a paragraph about there is a theoretical risk in parallel sessions and provide an option to enable session based cache for parallel runs if the user does want more safety. |
@gowerc the option already exists, but maybe documenting that if one wants fully independent compilation, setting |
Describe the bug
The default cache path for rstan model causes rbmi to crash when run by the same user in parallel. This is caused by rstan recompiling the model each time (and race conditions occurring when this is done in parallel).
To Reproduce
Run the rbmi vignette on several cores using the same user account in parallel.
I saw that in rbmi 1.3.1 the model file is now using a hash, which should also help with this (see #459), but is only a partial fix.
Environment (please complete the following information):
Proposed solution
My proposal to fix this permanently is to change the default in
rbmi/R/utilities.R
Line 620 in 1d150e6
That is, to set the default to
tempdir(check = TRUE)
, which is the per-session temporary directory instead of the current defaulttools::R_user_dir("rbmi", which = "cache")
, which is a persistent per-user cache directory.The workaround for the current version is to set
options("rbmi.cache_dir" = tempdir(check = TRUE))
.Tagging @bailliem for awareness
The text was updated successfully, but these errors were encountered: