Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplifies JoinKeys from R6 to S3 list-like object #184

Merged
merged 155 commits into from
Nov 20, 2023

Conversation

averissimo
Copy link
Contributor

@averissimo averissimo commented Oct 30, 2023

Pull Request

R6 object has been removed in favor of a list-like object

🏗️ WIP

Breaking changes:

  • Setter can be used multiple times join_keys(obj) <- list(join_key(...), ...)
  • Removed functions:
    • merge_join_keys: can be done in other ways: join_keys(jk) <- join_keys(...) (was r6$merge(x,y)`)
    • split_join_keys: Not relevant (was r6$split())
    • get_join_key: Can be done with [ or [[ (was r6$get(...))
  • Non-existing keys will return NULL instead of character(0)
  • jk["idx"] will output a join_key with all symmetric keys (if there are any)

Notable changes:

⇒ New class is an extension of a list.

  • Can be used directly to access the join key pairs obj[[dataset_1]][[dataset_2]]

JoinKeys mutations are no longer done by memory reference

join_keys(x) <- ... tries to minimize this, but mutate_join_key method and others may need to re-assign the join_keys.

⇒ All previous tests were converted and pass (this adds 1% to code coverage)

⇒ Setter can be used multiple times join_keys(obj) <- list(join_key(...), ...)

  • New behaviour merges the dataset, doesn't overwrite
  • Previously, it could only be done once and then it threw an error message
  • This is a side-effect of supporting the following notation: join_keys(obj)["ds1", "ds2"] <- "id"

How to use?

# Constructor `join_keys()` creates empty
jk <- join_keys(
  join_key("ds1", "ds2", "col1"), 
  join_key("ds3", "ds4", "col2")
)

# Getter takes `JoinKey` or `teal_data`
td <- teal_data()
join_keys(td)

# Setter
join_keys(jk) <- join_key("ds5", "ds6", "col3")
join_keys(td) <- join_key("ds5", "ds6", "col3") # note that it takes `teal_data`

# Merge
jk2 <- join_keys(
  join_key("not", "in", "jk")
)
join_keys(jk2) <- jk

# Subset (different methods)
jk["ds1"]
jk[1:2]
jk[["ds1"]]

# Subset assignment (4 different methods)
jk["ds1", "ds2"] <- "col1_method1"
jk[["ds1", "ds2"]] <- "col1_method2"
jk[["ds1"]][["ds2"]] <- "col1_method3"
jk[["ds1"]] <- list("ds2" = "col1_method4")

# Parents
parents(jk) <- list("ds1" = "ds2")
parents(jk)
parent(jk, "ds1")

# Internal structure
dput(jk) # check out internal structure

TODO

@averissimo averissimo changed the title initial support for list-based JoinKeys Simplifies JoinKeys from R6 to S3 list-like object Oct 30, 2023
@gogonzo
Copy link
Contributor

gogonzo commented Nov 1, 2023

@averissimo JoinKeys is not a list. Consequence of this is that we still need teal.data as dependency of teal.slice and teal.transform.

jk <- join_keys(
  join_key("df1", "df1", c("id", "id2")),
  join_key("df1", "df2", c("id" = "id")),
  join_key("df1", "df3", c("id" = "id")),
  join_key("df2", "df2", c("id", "id2")),
  join_key("df2", "df3", c("id", "id2")),
  join_key("df3", "df3", c("id", "id2"))
)

inherits(js, "list")
#>  FALSE
jk[["df1"]]
#>  NULL

@gogonzo gogonzo added the core label Nov 1, 2023
@averissimo
Copy link
Contributor Author

averissimo commented Nov 2, 2023

@gogonzo the join_keys() constructor was/is still using the JoinKeys R6 method. The getter version of join_keys already returns the new class (Placeholder)

Today I'm removing R6 and replacing it with new class/list after modifying all tests to use a unifying API instead of r6$method()

As it is, the code on your comment should NOW output what you expect.

jk <- join_keys(
  join_key("df1", "df1", c("id", "id2")),
  join_key("df1", "df2", c("id" = "id")),
  join_key("df1", "df3", c("id" = "id")),
  join_key("df2", "df2", c("id", "id2")),
  join_key("df2", "df3", c("id", "id2")),
  join_key("df3", "df3", c("id", "id2"))
)
# jk <- join_keys(jk) # removed this with my latest push (as R6 class was completely removed)
inherits(jk, "list")
#> [1] TRUE

jk[["df1"]]
#> $df1
#>    id   id2 
#>  "id" "id2" 
#> 
#> $df2
#>   id 
#> "id" 
#> 
#> $df3
#>   id 
#> "id"

@averissimo
Copy link
Contributor Author

The JoinKeys has been migrated inside of teal.data 🎉 🥳

See first comment (header?) for list of notable changes

R/join_key.R Outdated Show resolved Hide resolved
R/join_keys.R Outdated Show resolved Hide resolved
R/join_keys.R Outdated Show resolved Hide resolved
R/join_keys.R Outdated Show resolved Hide resolved
@vedhav
Copy link
Contributor

vedhav commented Nov 6, 2023

@averissimo This is no longer a caveat, right?

Caveat is that this (as it is) doesn't protect against direct changes obj[[ds1]][[ds2]] <- "new_key" will not propagate the change to the symmetric relationship obj[[ds2]][[ds1]]

@averissimo
Copy link
Contributor Author

@vedhav you are right! I removed that from the comment

@averissimo
Copy link
Contributor Author

averissimo commented Nov 6, 2023

@gogonzo can you give some context on using empty name on pairs?

  # From tests
  # set key on non-empty variable name equal to ""
  jk <- mutate_join_keys(join_keys(), "d1", "d2", c("A" = "B", "C" = ""))
  expect_equal(jk["d1", "d2"], setNames(c("B", ""), c("A", "C")))

Previous and current implementation break symmetry.

Do we need to support empty keys?

If not then my suggestion is to either take the name, or if there is no name, throw an error

mutate_join_keys(join_keys(), "d1", "d2", c("A" = "B", "C" = ""))
#> A JoinKeys object containing foreign keys between 2 datasets:
#> $d1
#> $d1$d2
#>   A   C 
#> "B"  "" 
#> 
#> 
#> $d2
#> $d2$d1
#>   B     
#> "A" "C"
  
mutate_join_keys(join_keys(), "d2", "d1", setNames(c("B" = "A", "C"), c("B", "")))
#> A JoinKeys object containing foreign keys between 2 datasets:
#> $d1
#> $d1$d2
#>   A   C 
#> "B" "C" 
#> 
#> 
#> $d2
#> $d2$d1
#>   B   C 
#> "A" "C"

Copy link
Contributor

github-actions bot commented Nov 6, 2023

badge

Code Coverage Summary

Filename                                 Stmts    Miss  Cover    Missing
-------------------------------------  -------  ------  -------  -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
R/as_cdisc.R                                39       4  89.74%   99-102
R/Callable.R                                45       0  100.00%
R/CallableCode.R                            36       2  94.44%   26, 63
R/CallableFunction.R                        88       3  96.59%   159-161
R/CallablePythonCode.R                      58      58  0.00%    21-227
R/cdisc_data.R                              41       1  97.56%   55
R/cdisc_join_keys.R                         11      11  0.00%    17-35
R/CDISCTealDataConnector.R                  20       3  85.00%   31, 36, 49
R/CDISCTealDataset.R                        46      11  76.09%   108-115, 204-206
R/CDISCTealDatasetConnector.R               26       1  96.15%   116
R/CodeClass.R                              111       1  99.10%   157
R/data_label.R                              36      13  63.89%   33-37, 56-63, 103
R/datanames.R                               11       0  100.00%
R/deep_clone_r6.R                            9       0  100.00%
R/dummy_function.R                           5       1  80.00%   23
R/formatters_var_labels.R                   49      21  57.14%   30, 38, 43-44, 46, 53, 99, 129-151
R/get_attrs.R                                2       2  0.00%    12-45
R/get_code.R                               173      19  89.02%   87, 140-143, 196-197, 207-208, 266, 297, 333, 337, 372, 381-385
R/get_dataname.R                             5       1  80.00%   39
R/get_dataset_label.R                        3       0  100.00%
R/get_dataset.R                             13       8  38.46%   39, 55, 80-86
R/get_datasets.R                            11       3  72.73%   96, 116, 140
R/get_join_keys.R                           10       0  100.00%
R/get_key_duplicates.R                      37       7  81.08%   41-47, 54-55
R/get_keys.R                                15       7  53.33%   68-69, 126-146
R/get_raw_data.R                            24      11  54.17%   168-181
R/include_css_js.R                           9       1  88.89%   20
R/is_pulled.R                                4       0  100.00%
R/join_key.R                                32       0  100.00%
R/join_keys-c.R                             26       0  100.00%
R/join_keys-extract.R                      122       0  100.00%
R/join_keys-names.R                         15       0  100.00%
R/join_keys-parents.R                       28       0  100.00%
R/join_keys-print.R                         47       0  100.00%
R/join_keys-utils.R                         87       1  98.85%   130
R/join_keys.R                               23       0  100.00%
R/load_dataset.R                            25      18  28.00%   58-63, 87-186
R/MAETealDataset.R                         138      57  58.70%   53, 115, 153-208, 224-229, 236-245, 282, 323-339
R/mutate_dataset.R                          18       0  100.00%
R/set_args.R                                10       5  50.00%   34-38
R/teal_data-class.R                         23       1  95.65%   68
R/teal_data.R                               39       2  94.87%   32, 59
R/TealData.R                               233     114  51.07%   183, 210, 222-291, 335-342, 375-380, 382, 384-389, 391, 408-453
R/TealDataAbstract.R                       232      24  89.66%   72, 85-88, 97-106, 215-218, 429, 454-458, 480, 486
R/TealDataConnection.R                     297     180  39.39%   58-59, 64, 67, 70, 106-163, 183, 186-188, 194-200, 205-207, 233, 238, 254-277, 287, 300, 321, 325-330, 333-336, 358-360, 364-371, 374-377, 392-406, 425-426, 446-517, 535-543, 545, 549-564, 567-570, 602, 608-612, 626, 661-663, 672-674
R/TealDataConnector.R                      196     102  47.96%   178, 190, 194, 207, 210-219, 221, 229-238, 321-325, 383-488
R/TealDataset.R                            367      23  93.73%   141-151, 382-386, 442-451, 503
R/TealDatasetConnector_constructors.R      270      52  80.74%   177-214, 727-732, 930-1006
R/TealDatasetConnector.R                   326      90  72.39%   169, 237, 251, 256, 270, 433, 456-495, 525, 540-570, 660, 670, 679-686, 699, 714-741
R/testhat-helpers.R                         54       0  100.00%
R/to_relational_data.R                      57       8  85.96%   34-35, 39, 98, 105, 111, 122-123
R/topological_sort.R                        32       0  100.00%
R/utils.R                                   56       9  83.93%   22-23, 27, 76-83
R/validate_data_args.R                      32       0  100.00%
R/zzz.R                                     11      11  0.00%    4-19
TOTAL                                     3733     886  76.27%

Diff against main

Filename                 Stmts    Miss  Cover
---------------------  -------  ------  --------
R/cdisc_join_keys.R        +11     +11  +100.00%
R/get_join_keys.R           +2      -2  +25.00%
R/join_key.R               +32       0  +100.00%
R/join_keys-c.R            +26       0  +100.00%
R/join_keys-extract.R     +122       0  +100.00%
R/join_keys-names.R        +15       0  +100.00%
R/join_keys-parents.R      +28       0  +100.00%
R/join_keys-print.R        +47       0  +100.00%
R/join_keys-utils.R        +87      +1  +98.85%
R/join_keys.R              +23       0  +100.00%
R/teal_data.R               -7       0  -0.78%
R/TealData.R                +5      +1  +0.63%
R/testhat-helpers.R        +54       0  +100.00%
R/zzz.R                     +5      +5  +100.00%
TOTAL                     +450     +16  +1.39%

Results for commit: cd4b2ca

Minimum allowed coverage is 80%

♻️ This comment has been updated with latest results

Copy link
Contributor

github-actions bot commented Nov 6, 2023

Unit Tests Summary

       1 files       36 suites   14s ⏱️
   445 tests    444 ✔️ 1 💤 0
1 056 runs  1 055 ✔️ 1 💤 0

Results for commit cd4b2ca.

♻️ This comment has been updated with latest results.

@gogonzo
Copy link
Contributor

gogonzo commented Nov 6, 2023

@gogonzo can you give some context on using empty name on pairs?
Do we need to support empty keys?

Yup, I think it was just a consequence of having named (for join_key("ds1", "ds2", c(a = "b"))) and unnamed join_key("ds1", "ds2", "id")). It wasn't empty key - it was just the same key. So it should be c("id" = "id"). There should be no unnamed keys.

If not then my suggestion is to either take the name, or if there is no name throw an error

Yes, I think for consistency reason it could be always named even if both datasets share the same key name.

Copy link
Contributor

@gogonzo gogonzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its exactly what we want, but little too much in my opinion. There are few functions exported which are not needed at all. I also think that [ should return JoinKeys object not a list for single dataset.

R/join_keys.R Outdated Show resolved Hide resolved
R/join_keys.R Outdated Show resolved Hide resolved
R/join_keys.R Outdated Show resolved Hide resolved
R/join_keys.R Outdated Show resolved Hide resolved
R/join_keys.R Outdated Show resolved Hide resolved
R/parents.R Outdated Show resolved Hide resolved
R/join_keys.R Outdated Show resolved Hide resolved
R/join_keys.R Outdated Show resolved Hide resolved
R/parents.R Outdated Show resolved Hide resolved
R/join_keys.R Outdated Show resolved Hide resolved
@averissimo
Copy link
Contributor Author

averissimo commented Nov 7, 2023

@gogonzo made a bunch of changes in response to your review. Check out the resolved conversation for the changes and I left some conversations that have follow-ups as unresolved.

Yes, I think for consistency reason it could be always named even if both datasets share the same key name.

So I'm changing the behaviour of mutate_join_keys(join_keys(), "d1", "d2", c("A" = "B", "C" = "")) so that the C name is applied as value ("C" = "" -> "C" = "C")

R/get_join_keys.R Outdated Show resolved Hide resolved
Copy link
Contributor

@chlebowa chlebowa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first look, non-exhaustive

R/join_keys.R Outdated Show resolved Hide resolved
R/join_key.R Outdated Show resolved Hide resolved
R/join_key.R Outdated Show resolved Hide resolved
R/join_key.R Outdated Show resolved Hide resolved
R/join_key.R Outdated Show resolved Hide resolved
inst/WORDLIST Outdated Show resolved Hide resolved
inst/WORDLIST Outdated Show resolved Hide resolved
inst/WORDLIST Show resolved Hide resolved
@averissimo
Copy link
Contributor Author

averissimo commented Nov 8, 2023

@insightsengineering/nest-core-dev

BLUF: should we keep join_keys(obj) <- obj2 as a merge operation instead of setter?

Context:

I'm having second thoughts about part of the new API of JoinKeys after discussing with part of the team.

In particular: join_keys(jk1) <- jk2 will merge jk2 into jk1 (instead of overwritting)

Why? Initially I thought it was a needed limitation with working along with [<- and [[<- , but now I know better 😅

🔴 Why is this a problem: It breaks with the expected behavior on R (such as names(...) <- and others)

🟠 Reasons to keep? I find it very clean for app developers to use for consecutive operations, but requires some adjusting. It is also slighlty easier to read the code than the subset assigment [<- or [[<-.

join_keys(jk1) <- merge_join_key(jk1, join_key("a", "b", "col"))
join_keys(jk1) <- merge_join_key(jk1, join_keys(join_key("a", "b", "col"), join_key("a", "d", "col2"))

# vs
join_keys(jk1) <- join_key("a", "b", "col")
join_keys(jk1) <- join_keys(join_key("a", "b", "col"), join_key("a", "d", "col2"))

🟢 Suggestion(s):

  • Open to discussion 😄
  • Keep as is
  • Setter only & Export (back) the merge_join_keys method
    • (or/and add a merge assignment)
join_keys(obj) <- merge_join_keys(obj, obj2) # or extend R's base::merge / or another base function

merge_join_keys(obj) <- obj2

@vedhav
Copy link
Contributor

vedhav commented Nov 8, 2023

should we keep join_keys(obj) <- obj2 as a merge operation instead of setter?

This confused me for the better part of today. But, once I got the hang of it I started to like this because it's such a simple way of creating join keys. Creating data and join keys is the main task for the app developers and I believe that just learning this one function and getting to know what it does should empower them to build quicker teal apps. They might have trouble building their first or second teal app but aftewards it's quicker and easier imo.

@chlebowa
Copy link
Contributor

chlebowa commented Nov 8, 2023

Since the new join keys class is a list and we use this fact to have usual list mechanics, I would rather not depart from that.
How about using c for "merging" keys? Can that be done?

inst/WORDLIST Show resolved Hide resolved
R/join_keys.R Outdated Show resolved Hide resolved
@m7pr
Copy link
Contributor

m7pr commented Nov 8, 2023

Hey, I just had the same conversation and confusion today when trying to figure out how to merge keys
image
and I figured that the most natural way would be to merge with c( (as this is also used for other basic R classes)

join_keys(data) <- c(keys_1, keys_2)

and it's better not to merge IMHO on <- as this is the assignment/overwrite operator.

It was very confusing to see join_keys(data) <- join_keys(join_key("MAE", "MAE", c("STUDYID", "USUBJID"))) and not see data on RHS at all. Of course I didn't read any documentation and was just trying to think how would the developer implement that for the most seamless usability.

@@ -31,7 +31,7 @@
#' })
#'
cdisc_data <- function(...,
join_keys = teal.data::cdisc_join_keys(...),
join_keys = teal.data::default_cdisc_join_keys[names(rlang::list2(...))],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good-looking default value.
I suggest to define the function with cdisc_data(..., join_keys, code = character(0), check = FALSE) and have join_keys default to teal.data::default_cdisc_join_keys[names(rlang::list2(...))] if missing.

R/teal_data.R Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deprecate check argument?

R/join_key.R Outdated Show resolved Hide resolved
R/join_key.R Outdated Show resolved Hide resolved
R/join_key.R Outdated Show resolved Hide resolved
R/join_keys.R Outdated Show resolved Hide resolved
R/join_keys-parents.R Outdated Show resolved Hide resolved
R/join_keys-parents.R Outdated Show resolved Hide resolved
R/join_keys-parents.R Outdated Show resolved Hide resolved
R/join_keys-parents.R Outdated Show resolved Hide resolved
@averissimo averissimo enabled auto-merge (squash) November 20, 2023 08:10
@averissimo averissimo merged commit 22c7d82 into main Nov 20, 2023
23 checks passed
@averissimo averissimo deleted the 78_simplify_joinkeys@main branch November 20, 2023 08:55
edelarua pushed a commit to insightsengineering/teal.modules.clinical that referenced this pull request Dec 1, 2023
fixes
#842
part of
insightsengineering/nestdevs-tasks#41
insightsengineering/teal.data#184

 

- [x] add thenews section for change.
- [x]  update the examples in the documentation to pass teal_data.
- [x] update join_keys calls once the pull request is completed at
insightsengineering/teal.data#184.
- [x]  Revise and update the vignettes accordingly.
- [ ] make version bump once
insightsengineering/teal.data#184. is merged.

---------

Co-authored-by: 27856297+dependabot-preview[bot]@users.noreply.github.com <27856297+dependabot-preview[bot]@users.noreply.github.com>
Co-authored-by: vedhav <[email protected]>
walkowif pushed a commit to walkowif/teal that referenced this pull request Dec 5, 2023
)

Related to [teal.data PR
insightsengineering#184](insightsengineering/teal.data#184)
Make changes to `teal` because of the refactor to the JoinKeys class
from R6 to S3.

---------

Signed-off-by: Vedha Viyash <[email protected]>
Co-authored-by: André Veríssimo <[email protected]>
Co-authored-by: Aleksander Chlebowski <[email protected]>
Co-authored-by: go_gonzo <[email protected]>
pawelru pushed a commit to insightsengineering/teal.goshawk that referenced this pull request Apr 8, 2024
fixes #243
part of
insightsengineering/nestdevs-tasks#41
insightsengineering/teal.data#184


- [x]  add the news section for change.
- [x]  update the examples in the documentation to pass teal_data.
- [x] update join_keys calls once the pull request is completed at
insightsengineering/teal.data#184.
- [x]  Revise and update the vignettes accordingly.
- [ ] make version bump once
insightsengineering/teal.data#184. is merged.

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: 27856297+dependabot-preview[bot]@users.noreply.github.com <27856297+dependabot-preview[bot]@users.noreply.github.com>
Co-authored-by: vedhav <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: Change JoinKeys to something simpler
7 participants