Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expectation of factors in DMT01 #675

Open
barnett11 opened this issue Oct 16, 2023 · 8 comments
Open

Expectation of factors in DMT01 #675

barnett11 opened this issue Oct 16, 2023 · 8 comments
Labels

Comments

@barnett11
Copy link
Contributor

Hi,
This issue covers beyond DMT01, but primarily impacts this extremely common table.
We can pass many variables into DMT01 to summarize (normally Demographic and Baseline Characteristics). If a category of a variable is present in one treatment group, but not another, then DMT01 fails and requires the variable to be passed as factor. This is a little erratic, especially as it will happen entirely dependent on data and its spread across treatment groups - and will eg. cause an error as data changes, or when applying filtering.
I understand this is not a chevron issue, but broader - but is there a better way we can process these types of templates so that the user is not forced to keep updating columns to factors just to display 0 under a treatment group?

@BFalquet
Copy link
Contributor

@clarkliming

@clarkliming
Copy link
Contributor

I totally agree that this should be fixed. @shajoezhu is it possible to fix this in rtables?

@clarkliming
Copy link
Contributor

depending on the requirement, it has different solutions.

  1. if empty space is needed, then we may want to update rtables so that even different stats they can be used (and non-existing will be shown as empty fields. However, there is a need to smartly pick up the order the these stats, e.g. col1 has a,b, col 2 has b, c we need to infer that all stats are a,b, c.
  2. if 0 is needed, then probably we can stop any "character" input and only allow "factor".
  3. Or, instead of really analyzing within each "row and column" split independently, we need to know the data in each column split. Then in a_summary a something like df_rows is needed to know all the levels

current solution: there is already a format related thing to convert character to factor. If new variablee need to be added, of course user can update the formats file. If new levels comes in, if proper format is used, the 0 will be still displayed

@clarkliming
Copy link
Contributor

https://github.com/insightsengineering/chevron.adr/blob/main/docs/decisions/0009-chevron-only-factor.md

@BFalquet @barnett11 please have a look at the proposal

@BFalquet please help evaluate the impact of only allowing factor in chevron
@barnett11 please also consider if it is fine to only allow factor in chevron?

@BFalquet
Copy link
Contributor

I agree with the diagnosis, we can easily convert to character with the risk of loosing (or at least not gaining) information about the missing levels. We can probably mitigate the risk by creating reformatting rules that introduces level that we know should be present (maybe possible for the different flag and other very controlled variables). For more exotic variables, unfortunately, there is not much we can do.

@clarkliming
Copy link
Contributor

@BFalquet it is converting all character to factor, so won't lose any levels

@barnett11
Copy link
Contributor Author

We had this originally is that correct, with a function to convert the dataset being read in to all factors (was it df_explicit_na?). Would this be the advised way again then, and why did we remove this before (I can't remember sorry)

@clarkliming
Copy link
Contributor

We had this originally is that correct, with a function to convert the dataset being read in to all factors (was it df_explicit_na?). Would this be the advised way again then, and why did we remove this before (I can't remember sorry)

df_explicit_na is doing to much things and all NA values will be treated identically (to be converted to something else) which is not ideal I believe. And a lot of analysis does not require factor I think, e.g. for counting "Y" or "N", so this step is not added to our script.

However now, we may consider at least convert those "analyze_vars" to use "factor" only, and users should take care of the "character-factor" conversion (for odering , keep missing levels, etc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants