From 5ec9dd68d6dcf58fb10574cbdd1cc4ceefa4b724 Mon Sep 17 00:00:00 2001 From: Chitu Okoli Date: Tue, 13 Feb 2024 12:58:40 +0100 Subject: [PATCH] Clean-up for 0.3.0 release --- vignettes/ale-intro.Rmd | 14 ++++++-------- vignettes/articles/ale-ALEPlot.Rmd | 12 ++++-------- 2 files changed, 10 insertions(+), 16 deletions(-) diff --git a/vignettes/ale-intro.Rmd b/vignettes/ale-intro.Rmd index 08e243a..27d7c3f 100644 --- a/vignettes/ale-intro.Rmd +++ b/vignettes/ale-intro.Rmd @@ -126,17 +126,16 @@ By default, most core functions in the `{ale}` package use parallel processing. To access the plot for a specific variable, we can call it by its variable name as an element of the `plots` element. These are `ggplot` objects, so they are easy to manipulate. For example, to access and print the `carat` ALE plot, we simply call `ale_gam_diamonds$plots$carat` : -```{r print carat, fig.width=3.5, fig.width=4} +```{r print-carat, fig.width=3.5, fig.width=4} # Print a plot by entering its reference ale_gam_diamonds$plots$carat ``` To iterate the list and plot all the ALE plots, we provide here some demonstration code using the `patchwork` package for arranging multiple plots in a common plot grid using `patchwork::wrap_plots()`. We need to pass the list of plots to the `grobs` argument and we can specify that we want two plots per row with the `ncol` argument. -```{r print ale_simple, fig.width=7, fig.height=11} +```{r print-ale_simple, fig.width=7, fig.height=11} # Print all plots -ale_gam_diamonds$plots |> - patchwork::wrap_plots(ncol = 2) +patchwork::wrap_plots(ale_gam_diamonds$plots, ncol = 2) ``` ## Bootstrapped ALE @@ -164,8 +163,7 @@ ale_gam_diamonds_boot <- ale( ) # Bootstrapping produces confidence intervals -ale_gam_diamonds_boot$plots |> - patchwork::wrap_plots(ncol = 2) +patchwork::wrap_plots(ale_gam_diamonds_boot$plots, ncol = 2) ``` In this case, the bootstrapped results are mostly similar to single (non-bootstrapped) ALE result. In principle, we should always bootstrap the results and trust only in bootstrapped results. The most unusual result is that values of `x_length` (the length of the diamond) from 6.2 mm or so and higher are associated with lower diamond prices. When we compare this with the `y_width` value (width of the diamond), we suspect that when both the length and width (that is, the size) of a diamond become increasingly large, the price increases so much more rapidly with the width than with the length that the width has an inordinately high effect that is tempered by a decreased effect of the length at those high values. This would be worth further exploration for real analysis, but here we are just introducing the key features of the package. @@ -187,7 +185,7 @@ Like the `ale()` function, the `ale_ixn()` returns a list with one element per i Again, we provide here some demonstration code to plot all the ALE plots. It is a little more complex this time because of the two levels of interacting variables in the output data, so we use the `purrr` package to iterate the list structure. `purrr::walk()` takes a list as its first argument and then we specify an anonymous function for what we want to do with each element of the list. We specify the anonymous function as `\(.x1) {...}` where `.x1` in our case represents each individual element of `ale_ixn_gam_diamonds$plots` in turn, that is, a sublist of plots with which the x1 variable interacts. We print the plots of all the x1 interactions as a combined grid of plots with `patchwork::wrap_plots()`, as before. -```{r print all ale_ixn, fig.width=7, fig.height=7} +```{r print-all-ale_ixn, fig.width=7, fig.height=7} # Print all interaction plots ale_ixn_gam_diamonds$plots |> # extract list of x1 ALE outputs @@ -201,7 +199,7 @@ ale_ixn_gam_diamonds$plots |> Because we are printing all plots together with the same `patchwork::wrap_plots()` statement, some of them might appear vertically distorted because each plot is forced to be of the same height. For more fine-tuned presentation, we would need to refer to a specific plot. For example, we can print the interaction plot between carat and depth by referring to it thus: `ale_ixn_gam_diamonds$plots$carat$depth`. -```{r print specific ixn, fig.width=5, fig.height=3} +```{r print-specific-ixn, fig.width=5, fig.height=3} ale_ixn_gam_diamonds$plots$carat$depth ``` diff --git a/vignettes/articles/ale-ALEPlot.Rmd b/vignettes/articles/ale-ALEPlot.Rmd index 288d73b..6c28580 100644 --- a/vignettes/articles/ale-ALEPlot.Rmd +++ b/vignettes/articles/ale-ALEPlot.Rmd @@ -160,8 +160,7 @@ Since the plots are saved as a list, they can easily be printed out all at once: ```{r ale nnet one-way plots, fig.width=7, fig.height=5} # Print plots -nn_ale$plots |> - patchwork::wrap_plots() +patchwork::wrap_plots(nn_ale$plots) ``` The `{ale}` package plots have various features that enhance interpretability: @@ -177,8 +176,7 @@ It might not be clear that the previous plots display exactly the same data as t # Zero-centred ALE nn_ale <- ale(DAT, nnet.DAT, pred_type = "raw", relative_y = 'zero') -nn_ale$plots |> - patchwork::wrap_plots() +patchwork::wrap_plots(nn_ale$plots) ``` With these zero-centred plots, the full range of y values and the rug plots give some context that aids interpretation. (If the rugs look slightly different, it is because they are randomly jittered to avoid overplotting.) @@ -302,8 +300,7 @@ gbm_ale_link <- url('https://github.com/tripartio/ale/raw/main/download/gbm_ale_ readRDS() # Print plots -gbm_ale_link$plots |> - patchwork::wrap_plots(ncol = 2) +patchwork::wrap_plots(gbm_ale_link$plots, ncol = 2) ``` Now we generate ALE data for all two-way interactions and then plot them. Again, note the interaction between `age` and `hours_per_week`. The interaction is minimal except for the extremely high cases of hours per week. @@ -374,8 +371,7 @@ gbm_ale_prob <- url('https://github.com/tripartio/ale/raw/main/download/gbm_ale_ readRDS() # Print plots -gbm_ale_prob$plots |> - patchwork::wrap_plots(ncol = 2) +patchwork::wrap_plots(gbm_ale_prob$plots, ncol = 2) ``` Finally, we again generate two-way interactions, this time based on probabilities instead of on log odds. However, probabilities might not be the best choice for indicating interactions because, as we see from the rugs in the one-way ALE plots, the GBM model heavily concentrates its probabilities in the extremes near 0 and 1. Thus, the plots' suggestions of strong interactions are likely exaggerated. In this case, the log odds ALEs shown above are probably more relevant.