Adding cumulative counts and percents to ard_categorical() (#373)

**What changes are proposed in this pull request?** * The `ard_categorical()` function can now return cumulative counts and percentages with `ard_categorical(statistic = varname ~ c('n_cum', 'p_cum'))`. (#145) **Reference GitHub issue associated with pull request.** _e.g., 'closes #<issue number>'_ closes #145 -------------------------------------------------------------------------------- Pre-review Checklist (if item does not apply, mark is as complete) - [x] **All** GitHub Action workflows pass with a ✅ - [x] PR branch has pulled the most recent updates from master branch: `usethis::pr_merge_main()` - [x] If a bug was fixed, a unit test was added. - [x] Code coverage is suitable for any new functions/features (generally, 100% coverage for new code): `devtools::test_coverage()` - [x] Request a reviewer Reviewer Checklist (if item does not apply, mark is as complete) - [ ] If a bug was fixed, a unit test was added. - [ ] Run `pkgdown::build_site()`. Check the R console for errors, and review the rendered website. - [ ] Code coverage is suitable for any new functions/features: `devtools::test_coverage()` When the branch is ready to be merged: - [ ] Update `NEWS.md` with the changes from this pull request under the heading "`# cards (development version)`". If there is an issue associated with the pull request, reference it in parentheses at the end update (see `NEWS.md` for examples). - [ ] **All** GitHub Action workflows pass with a ✅ - [ ] Approve Pull Request - [ ] Merge the PR. Please use "Squash and merge" or "Rebase and merge". _Optional Reverse Dependency Checks_: - Install `checked` with `pak::pak("Genentech/checked")` or `pak::pak("checked")` - Check dev versions of `cardx`, `gtsummary`, and `tfrmt` which are in the `ddsjoberg` R Universe ```shell Rscript -e "options(checked.check_envvars = c(NOT_CRAN = TRUE)); checked::check_rev_deps(path = '.', repos = c('https://ddsjoberg.r-universe.dev', 'https://cloud.r-project.org'))" ``` - Check CRAN reverse dependencies but run tests skipped on CRAN ```shell Rscript -e "options(checked.check_envvars = c(NOT_CRAN = TRUE)); checked::check_rev_deps(path = '.', repos = 'https://cloud.r-project.org')" ``` - Check CRAN reverse dependencies in a CRAN-like environment ```shell Rscript -e "options(checked.check_envvars = c(NOT_CRAN = FALSE), checked.check_build_args = '--as-cran'); checked::check_rev_deps(path = '.', repos = 'https://cloud.r-project.org')" ``` --------- Co-authored-by: Becca Krouse <[email protected]>
insightsengineering · Jan 16, 2025 · e027ad7 · e027ad7
1 parent d9db1d4
commit e027ad7
Show file tree

Hide file tree

Showing 15 changed files with 396 additions and 73 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -33,17 +33,17 @@ _Optional Reverse Dependency Checks_:
 - Check dev versions of `cardx`, `gtsummary`, and `tfrmt` which are in the `ddsjoberg` R Universe
 
   ```shell
-  Rscript -e "options(checked.check_envvars = c(NOT_CRAN = TRUE)); checked::check_rev_deps(path = '.', repos = c('https://ddsjoberg.r-universe.dev', 'https://cloud.r-project.org'))"
+  Rscript -e "options(checked.check_envvars = c(NOT_CRAN = TRUE)); checked::check_rev_deps(path = '.', n = parallel::detectCores() - 2L, repos = c('https://ddsjoberg.r-universe.dev', 'https://cloud.r-project.org'))"
   ```
 
 - Check CRAN reverse dependencies but run tests skipped on CRAN
 
   ```shell
-  Rscript -e "options(checked.check_envvars = c(NOT_CRAN = TRUE)); checked::check_rev_deps(path = '.', repos = 'https://cloud.r-project.org')"
+  Rscript -e "options(checked.check_envvars = c(NOT_CRAN = TRUE)); checked::check_rev_deps(path = '.', n = parallel::detectCores() - 2, repos = 'https://cloud.r-project.org')"
   ```
 
 - Check CRAN reverse dependencies in a CRAN-like environment
 
   ```shell
-  Rscript -e "options(checked.check_envvars = c(NOT_CRAN = FALSE), checked.check_build_args = '--as-cran'); checked::check_rev_deps(path = '.', repos = 'https://cloud.r-project.org')"
+  Rscript -e "options(checked.check_envvars = c(NOT_CRAN = FALSE), checked.check_build_args = '--as-cran'); checked::check_rev_deps(path = '.', n = parallel::detectCores() - 2, repos = 'https://cloud.r-project.org')"
   ```
diff --git a/.github/workflows/test-coverage.yaml b/.github/workflows/test-coverage.yaml
@@ -4,9 +4,10 @@ on:
   push:
     branches: [main, master]
   pull_request:
-    branches: [main, master]
 
-name: test-coverage
+name: test-coverage.yaml
+
+permissions: read-all
 
 jobs:
   test-coverage:
@@ -15,36 +16,46 @@ jobs:
       GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
 
     steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
 
       - uses: r-lib/actions/setup-r@v2
         with:
           use-public-rspm: true
 
       - uses: r-lib/actions/setup-r-dependencies@v2
         with:
-          extra-packages: any::covr
-          needs: coverage, check
+          extra-packages: any::covr, any::xml2
+          needs: coverage
 
       - name: Test coverage
         run: |
-          covr::codecov(
+          cov <- covr::package_coverage(
             quiet = FALSE,
             clean = FALSE,
-            install_path = file.path(Sys.getenv("RUNNER_TEMP"), "package")
+            install_path = file.path(normalizePath(Sys.getenv("RUNNER_TEMP"), winslash = "/"), "package")
           )
+          covr::to_cobertura(cov)
         shell: Rscript {0}
 
+      - uses: codecov/codecov-action@v4
+        with:
+          # Fail if error if not on PR, or if on PR and token is given
+          fail_ci_if_error: ${{ github.event_name != 'pull_request' || secrets.CODECOV_TOKEN }}
+          file: ./cobertura.xml
+          plugin: noop
+          disable_search: true
+          token: ${{ secrets.CODECOV_TOKEN }}
+
       - name: Show testthat output
         if: always()
         run: |
           ## --------------------------------------------------------------------
-          find ${{ runner.temp }}/package -name 'testthat.Rout*' -exec cat '{}' \; || true
+          find '${{ runner.temp }}/package' -name 'testthat.Rout*' -exec cat '{}' \; || true
         shell: bash
 
       - name: Upload test results
         if: failure()
-        uses: actions/upload-artifact@v3
+        uses: actions/upload-artifact@v4
         with:
           name: coverage-test-failures
           path: ${{ runner.temp }}/package
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -31,7 +31,7 @@ Suggests:
     spelling (>= 2.2.0),
     testthat (>= 3.2.0),
     withr (>= 3.0.0)
-Config/Needs/check: hms
+Config/Needs/coverage: hms
 Config/Needs/website: rmarkdown, jsonlite, yaml, gtsummary, tfrmt,
     insightsengineering/nesttemplate
 Config/testthat/edition: 3

diff --git a/R/ard_categorical.R b/R/ard_categorical.R
@@ -17,13 +17,13 @@
 #'   Arguments may be used in conjunction with one another.
 #' @param variables ([`tidy-select`][dplyr::dplyr_tidy_select])\cr
 #'   columns to include in summaries. Default is `everything()`.
-#' @param denominator (`data.frame`, `integer`)\cr
-#'   Specify this *optional* argument to change the denominator,
-#'   e.g. the `"N"` statistic. Default is `NULL`. See below for details.
+#' @param denominator (`string`, `data.frame`, `integer`)\cr
+#'   Specify this argument to change the denominator,
+#'   e.g. the `"N"` statistic. Default is `'column'`. See below for details.
 #' @param statistic ([`formula-list-selector`][syntax])\cr
 #'   a named list, a list of formulas,
-#'   or a single formula where the list element one or more of  `c("n", "N", "p")`
-#'   (or the RHS of a formula).
+#'   or a single formula where the list element one or more of  `c("n", "N", "p", "n_cum", "p_cum")`
+#'   (on the RHS of a formula).
 #' @param stat_label ([`formula-list-selector`][syntax])\cr
 #'   a named list, a list of formulas, or a single formula where
 #'   the list element is either a named list or a list of formulas defining the
@@ -45,14 +45,18 @@
 #' In such cases, use the `denominator` argument to specify a new definition
 #' of `"N"`, and subsequently `"p"`.
 #' The argument expects one of the following inputs:
+#' - a string: one of `"column"`, `"row"`, or `"cell"`.
+#'     - `"column"`, the default, returns percentages where the sum is equal to
+#'        one within the variable after the data frame has been subset with `by`/`strata`.
+#'     - `"row"` gives 'row' percentages where `by`/`strata` columns are the 'top'
+#'        of a cross table, and the variables are the rows. This is well-defined
+#'        for a single `by` or `strata` variable, and care must be taken when there
+#'        are more to ensure the the results are as you expect.
+#'   - `"cell"` gives percentages where the denominator is the number of non-missing
+#'        rows in the source data frame.
 #' - a data frame. Any columns in the data frame that overlap with the `by`/`strata`
 #'   columns will be used to calculate the new `"N"`.
 #' - an integer. This single integer will be used as the new `"N"`
-#' - a string: one of `"column"`, `"row"`, or `"cell"`. `"column"` is equivalent
-#'   to `denominator=NULL`. `"row"` gives 'row' percentages where `by`/`strata`
-#'   columns are the 'top' of a cross table, and the variables are the rows.
-#'   `"cell"` gives percentages where the denominator is the number of non-missing
-#'   rows in the source data frame.
 #' - a structured data frame. The data frame will include columns from `by`/`strata`.
 #'   The last column must be named `"...ard_N..."`. The integers in this column will
 #'   be used as the updated `"N"` in the calculations.
@@ -104,7 +108,7 @@ ard_categorical.data.frame <- function(data,
                                        by = dplyr::group_vars(data),
                                        strata = NULL,
                                        statistic = everything() ~ c("n", "p", "N"),
-                                       denominator = NULL,
+                                       denominator = "column",
                                        fmt_fn = NULL,
                                        stat_label = everything() ~ default_stat_labels(),
                                        ...) {
@@ -137,8 +141,8 @@ ard_categorical.data.frame <- function(data,
   )
   check_list_elements(
     x = statistic,
-    predicate = \(x) is.character(x) && all(x %in% c("n", "p", "N")),
-    error_msg = "Elements passed in the {.arg statistic} argument must be one or more of {.val {c('n', 'p', 'N')}}"
+    predicate = \(x) is.character(x) && all(x %in% c("n", "p", "N", "n_cum", "p_cum")),
+    error_msg = "Elements passed in the {.arg statistic} argument must be one or more of {.val {c('n', 'p', 'N', 'n_cum', 'p_cum')}}"
   )
 
   # return empty ARD if no variables selected ----------------------------------
@@ -247,7 +251,7 @@ ard_categorical.data.frame <- function(data,
         imap(
           statistics_tabulation,
           function(x, variable) {
-            if (any(c("N", "p") %in% x[["tabulation"]])) {
+            if (any(c("N", "p", "p_cum") %in% x[["tabulation"]])) {
               TRUE
             } else {
               NULL
@@ -282,22 +286,32 @@ ard_categorical.data.frame <- function(data,
               ))
             }
         }
-        if ("p" %in% tab_stats[["tabulation"]]) {
+        if (any(c("p", "p_cum") %in% tab_stats[["tabulation"]])) {
           df_result_tabulation <-
             df_result_tabulation |>
             dplyr::mutate(
               ...ard_p... = .data$...ard_n... / .data$...ard_N...
             )
         }
 
+        df_result_tabulation <-
+          .add_cum_count_stats(
+            df_result_tabulation,
+            variable = variable,
+            by = by,
+            strata = strata,
+            denominator = denominator,
+            tab_stats = tab_stats
+          )
+
         df_result_tabulation |>
           .nesting_rename_ard_columns(variable = variable, by = by, strata = strata) |>
           dplyr::mutate(
-            across(any_of(c("...ard_n...", "...ard_N...", "...ard_p...")), as.list),
+            across(any_of(c("...ard_n...", "...ard_N...", "...ard_p...", "...ard_n_cum...", "...ard_p_cum...")), as.list),
             across(c(matches("^group[0-9]+_level$"), any_of("variable_level")), as.list)
           ) |>
           tidyr::pivot_longer(
-            cols = any_of(c("...ard_n...", "...ard_N...", "...ard_p...")),
+            cols = any_of(c("...ard_n...", "...ard_N...", "...ard_p...", "...ard_n_cum...", "...ard_p_cum...")),
             names_to = "stat_name",
             values_to = "stat"
           ) |>
@@ -334,6 +348,52 @@ ard_categorical.data.frame <- function(data,
   )
 }
 
+
+
+.add_cum_count_stats <- function(x, variable, by, strata, denominator, tab_stats) {
+  # if no cumulative stats were requested, return the object
+  if (!any(c("p_cum", "n_cum") %in% tab_stats[["tabulation"]])) {
+    return(x)
+  }
+
+  # to return cumulative stats, the denominator must be 'column' or 'row'
+  if (!is_string(denominator) || !denominator %in% c("column", "row")) {
+    cli::cli_abort(
+      "The {.arg denominator} argument must be one of {.val {c(\"column\", \"row\")}}
+       when cumulative statistics {.val n_cum} or {.val p_cum} are specified, which
+       were requested for variable {.var {variable}}.",
+      call = get_cli_abort_call()
+    )
+  }
+
+  # calculate the cumulative statistics
+  if (denominator %in% "column") {
+    x <- x |>
+      dplyr::mutate(
+        .by = any_of(c(by, strata)),
+        ...ard_n_cum... = switch("n_cum" %in% tab_stats[["tabulation"]],
+          cumsum(.data$...ard_n...)
+        ),
+        ...ard_p_cum... = switch("p_cum" %in% tab_stats[["tabulation"]],
+          cumsum(.data$...ard_p...)
+        )
+      )
+  } else if (denominator %in% "row") {
+    x <- x |>
+      dplyr::mutate(
+        .by = any_of(variable),
+        ...ard_n_cum... = switch("n_cum" %in% tab_stats[["tabulation"]],
+          cumsum(.data$...ard_n...)
+        ),
+        ...ard_p_cum... = switch("p_cum" %in% tab_stats[["tabulation"]],
+          cumsum(.data$...ard_p...)
+        )
+      )
+  }
+
+  x
+}
+
 #' Results from `table()` as Data Frame
 #'
 #' Takes the results from [table()] and returns them as a data frame.

diff --git a/R/default_stat_labels.R b/R/default_stat_labels.R
@@ -21,6 +21,8 @@ default_stat_labels <- function() {
     n = "n",
     N = "N",
     p = "%",
+    n_cum = "Cumulative n",
+    p_cum = "Cumulative %",
     N_obs = "Vector Length",
     N_miss = "N Missing",
     N_nonmiss = "N Non-missing",

diff --git a/README.Rmd b/README.Rmd
@@ -17,7 +17,7 @@ knitr::opts_chunk$set(
 
 <!-- badges: start -->
 [![CRAN status](https://www.r-pkg.org/badges/version/cards)](https://CRAN.R-project.org/package=cards)
-[![Codecov test coverage](https://codecov.io/gh/insightsengineering/cards/branch/main/graph/badge.svg)](https://app.codecov.io/gh/insightsengineering/cards?branch=main)
+[![Codecov test coverage](https://codecov.io/gh/insightsengineering/cards/graph/badge.svg)](https://app.codecov.io/gh/insightsengineering/cards)
 [![Downloads](https://cranlogs.r-pkg.org/badges/cards)](https://cran.r-project.org/package=cards)
 [![R-CMD-check](https://github.com/insightsengineering/cards/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/insightsengineering/cards/actions/workflows/R-CMD-check.yaml)
 [![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)

diff --git a/README.md b/README.md
@@ -8,7 +8,7 @@
 [![CRAN
 status](https://www.r-pkg.org/badges/version/cards)](https://CRAN.R-project.org/package=cards)
 [![Codecov test
-coverage](https://codecov.io/gh/insightsengineering/cards/branch/main/graph/badge.svg)](https://app.codecov.io/gh/insightsengineering/cards?branch=main)
+coverage](https://codecov.io/gh/insightsengineering/cards/graph/badge.svg)](https://app.codecov.io/gh/insightsengineering/cards)
 [![Downloads](https://cranlogs.r-pkg.org/badges/cards)](https://cran.r-project.org/package=cards)
 [![R-CMD-check](https://github.com/insightsengineering/cards/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/insightsengineering/cards/actions/workflows/R-CMD-check.yaml)
 [![Lifecycle:

diff --git a/man/ard_categorical.Rd b/man/ard_categorical.Rd
diff --git a/man/ard_dichotomous.Rd b/man/ard_dichotomous.Rd