Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: More customization with ard_hierarchical() #375

Closed
daniel-woodie opened this issue Jan 15, 2025 · 9 comments
Closed

Feature Request: More customization with ard_hierarchical() #375

daniel-woodie opened this issue Jan 15, 2025 · 9 comments

Comments

@daniel-woodie
Copy link

daniel-woodie commented Jan 15, 2025

ard_hierarchical custom numerators + additional statistics

I'm trying to replicate a scenario with ard_hierarchical and am struggling. I have a suspicion it's possible but it's hard gaining intuition about how this function works and how to customize the statistics being calculated.

For example, I'd like my output ards to be something like:

| group1_level | group2_level | group3_level | stat_name | stat |
| TRTA | Subgroup A | Visit 1 | n | 3 |
| TRTA | Subgroup A | Visit 1 | N | 10 |
| TRTA | Subgroup A | Visit 1 | p | .3 |
| TRTA | Subgroup A | Visit 2 | n | 6 |
| TRTA | Subgroup A | Visit 2 | N | 11 |
| TRTA | Subgroup A | Visit 2 | p | .545 |

In theory, I'd like something like:
`
ard_hierarchical(
data = my_adam,
by = trt,
variables = c(subgroup, visit, some_binary_outcome),
denominator = my_adam |> group_by(trt, subgroup, visit) |> summarise(some_binary_outcome = n())

`
Sorry it's not minimally reproducible but hopefully that makes sense. Any thoughts appreciated. Additionally, if you have any other guidance on how to use the statistic parameter to calculate custom statistics, that'd be awesome. I checked out the tidyselect stuff in the documentation but didn't get a whole lot from that.

@daniel-woodie daniel-woodie changed the title Feature Request: <Description> More customization with ard_hierarchical() Feature Request: More customization with ard_hierarchical() Jan 15, 2025
@ddsjoberg
Copy link
Collaborator

Hi @daniel-woodie ! Thanks for the post!

Instead of only showing the structure of the output you're after, could you also provide a description of the summary you are trying to calculate? Also, you can use the prepared ADaM data sets in the pharmaverseadam R package to create examples (https://pharmaverse.github.io/pharmaverseadam/).

My suspicion is that you may want to use ard_strata() or even ard_categorical() instead, but it's hard to say without more context.
FYI @jtalboys is currently writing a vignette on prepping ARDs for long datasets, which looks like it will be helpful for you when it's published.

@daniel-woodie
Copy link
Author

daniel-woodie commented Jan 16, 2025

I was actually able to create the output by breaking down the dataset into smaller chunks and using a combination of ard_dichotomous and ard_categorical_ci.

I'm not sure how useful it is but here's a variation on what I did using some dummy CDISC data (adapted from an example in the TLG gallery).

library(cards)
library(cardx)
library(dplyr)
library(tern)
library(random.cdisc.data)
adsl <- random.cdisc.data::cadsl

## Prep the data some
adsl <- adsl |> 
  mutate(
    SEX = factor(case_when(
      SEX == "M" ~ "Male",
      SEX == "F" ~ "Female",
      SEX == "U" ~ "Unknown",
      SEX == "UNDIFFERENTIATED" ~ "Undifferentiated"
    )),
    AGEGR1 = factor(
      case_when(
        between(AGE, 18, 40) ~ "18-40",
        between(AGE, 41, 64) ~ "41-64",
        AGE > 64 ~ ">=65"
      ),
      levels = c("18-40", "41-64", ">=65")
    ),
    BMRKR1_CAT = factor(
      case_when(
        BMRKR1 < 3.5 ~ "LOW",
        BMRKR1 >= 3.5 & BMRKR1 < 10 ~ "MEDIUM",
        BMRKR1 >= 10 ~ "HIGH"
      ),
      levels = c("LOW", "MEDIUM", "HIGH")
    )
  ) |> 
  var_relabel(
    BMRKR1_CAT = "Biomarker 1 Categories"
  ) |> 
  mutate(BINARY_OUTCOME = rbinom(n = n(), size = 1, prob = .5))

## Get the big Ns
cards_big_n <- adsl |> 
  ard_categorical(variables = TRT01P, statistic = everything() ~ "n") |> 
  rename(group1 = variable,
         group1_level = variable_level)

## n, p, CIs
cards_ards <- ard_dichotomous(data = adsl |> filter(BMRKR1_CAT == "LOW"), 
                              by = c(TRT01P, SEX, AGEGR1, BMRKR1_CAT), 
                              variables = c(BINARY_OUTCOME))

cards_ards_stat <- ard_categorical_ci(data = adsl |> filter(BMRKR1_CAT == "LOW"), 
                                      by = c(TRT01P, SEX, AGEGR1, BMRKR1_CAT), 
                                      variables = c(BINARY_OUTCOME),
                                      method = "wald") |> 
  filter(stat_name %in% c("conf.low", "conf.high"))

## Stack everything
final_cards_ards <- bind_ard(cards_big_n, cards_ards, cards_ards_stat)

Hoping it's not too confusing because it's a variation on working with real data but with a toy example + reduced to be concise. However kludgy it looks, it did the trick haha. From here it was largely an exercise in formatting.

Few questions I came across when working on this exercise.

  1. What function should I use and why? ard_hierarchical vs ard_categorical vs ard_strata. Additionally, I ultimately landed on none of the above and a combination of ard_dichotomous + ard_categorical_ci ended up being what I chose.
  2. More examples about how the denominator works. What are the different ways you can use this parameter to swap out the denominator?
  3. When should you break up something into separate function calls vs cramming everything into a single function?
  4. How do you use the statistic parameter? Having more examples of how to use this would be super helpful.

Anyway, thanks for making such an awesome package and for being responsive in the issues!

@ddsjoberg
Copy link
Collaborator

  1. What function should I use and why? ard_hierarchical vs ard_categorical vs ard_strata. Additionally, I ultimately landed on none of the above and a combination of ard_dichotomous + ard_categorical_ci ended up being what I chose.

I think of ard_hierarchical() for AE tables and con meds tables primarily. ard_strata() is useful for doing the same tabulation/analysis within subgroups.

  1. More examples about how the denominator works. What are the different ways you can use this parameter to swap out the denominator?

We are working on putting together more documentation and examples. We'll get there!

  1. How do you use the statistic parameter? Having more examples of how to use this would be super helpful.

Sure we can add that!

@daniel-woodie
Copy link
Author

Gotcha. Thanks! Yes and happy to contribute back if it's helpful and doesn't get in the way.

Regarding the set of functions I ultimately landed on, here's a few reasons why:

  1. For big Ns, I wanted this value for each treatment group. I know there's a function ard_total_n() and I thought this would do the trick. It appears to just work for calculating the total N (in the name, I know, sorry haha) but I don't think it'd be that big of a stretch to include the ability to tally up the N for each treatment group. Also, as you see, I had to rename the columns because of how I did some of the grouping downstream. I wouldn't have had to do this if I could use the by parameter instead of the variables parameter. Not sure the best answer here but that's why the code looks this way for this step.
  2. ard_dichotomous was able to give me most of what I needed for doing n, N, and p for the groupings I did. I included a filter on the data due to how I chunked up the ARDS.
  3. ard_categorical_ci actually gave me the confidence intervals that I needed. It also gave me the counts. I probably could have used this by itself (i.e. without ard_dichotomous) but it didn't give back the numerator or n from ard_dichotomous.

@ddsjoberg
Copy link
Collaborator

Regarding the big Ns: ARDs are meant to be results only and are divorced from how the results are displayed downstream. While you can rename the columns, the ARD that results from ard_categorical(adsl, variables = TRT01P, statistic = everything() ~ "n") is exactly what we need. Is there a reason you needed to rename in your use case?

We've considered adding 'n' to the list of returned stats in ard_categorical_ci(). Sounds like we should go ahead and do it

@daniel-woodie
Copy link
Author

Thanks about adding 'n'! Regarding big Ns: the only downstream 'display' was just in the ARDS. I haven't gotten into displaying these in a table. The reason for the renaming is because the treatment columns don't line up in the final ARDS (bind_ards step). See code example below.

Just do:
## Get the big Ns
cards_big_n <- adsl |> 
  ard_categorical(variables = TRT01P, statistic = everything() ~ "n") |> 
  rename(group1 = variable,
         group1_level = variable_level)

cards_big_n_no_rename <- adsl |> 
  ard_categorical(variables = TRT01P, statistic = everything() ~ "n")

## n, p, CIs
cards_ards <- ard_dichotomous(data = adsl |> filter(BMRKR1_CAT == "LOW"), 
                              by = c(TRT01P, SEX, AGEGR1, BMRKR1_CAT), 
                              variables = c(BINARY_OUTCOME))

## Stack everything
bind_ard(cards_big_n, cards_ards)
bind_ard(cards_big_n_no_rename, cards_ards)

It's not a big deal to rename the columns. My first thought, however, was that I should be able to specify the 'by' parameter like I've done in the other step.

@ddsjoberg
Copy link
Collaborator

While you are playing around with the functions, I would recommend trying first to use the results as they come. That is, when you need the Ns for the treatment groups, tabulate the TRT01P and use the result as it is. I think that will serve you better in the long run. Also this structure is great for QC across many tables and figures: no matter how the TRT01P counts are reported in the table (e.g. across the header, longwise in the body of a table) the ARD that has TRT01P always looks exactly the same, which makes it easy to check for consistency in the counts across all tables/figures in a CSR.

I can see what you're saying about the names after you stack the ARDs. Also consider that ARDs don't need to be stacked. In some cases, that makes perfect sense. And perhaps in your case, there is no need to stack the TRT01P counts with the other stratified statistics, if it's easier to digest the results when not stacked.

@daniel-woodie
Copy link
Author

If stacking my ARDs makes me wrong then I don't think I want to be right.

Joking. Thanks for all the help! Loving all the progress with {cards}/{cardx}. I'll keep playing around with it.

@ddsjoberg
Copy link
Collaborator

Ha! I love stacking too!

But there are situations (particularly in cardx) where stacking probably won't be a good option for us. More to come on that point in the future!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants