Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random cdisc data very slow for larger data #21

Open
cicdguy opened this issue Aug 5, 2021 · 3 comments
Open

random cdisc data very slow for larger data #21

cicdguy opened this issue Aug 5, 2021 · 3 comments
Labels
bug Something isn't working sme

Comments

@cicdguy
Copy link
Contributor

cicdguy commented Aug 5, 2021

Original message

Running the following code takes a long time! This is on r.roche.com, r 3.6.3

NEST/nest_on_bee/master/bee_nest_utils.R")
bee_use_nest(release = "2021_05_05")
ADSL <- radsl(N = 1002)
ADLB <- radlb(ADSL)

I reduced this from 15000 as it took way too long. Using system.time I get the following results:

user system elapsed
37.852 0.584 38.436

This is extremely long to make a dataset with 21,000 records! I know random.cdisc really only exists for dummy data, but this seems like extremely poor performance

Provenance:

Creator: martik32

TODO

Improve performance. A few suggestion

  1. use mclapply
  2. datatable if necessary
@cicdguy cicdguy added the bug Something isn't working label Aug 5, 2021
@nsteed15 nsteed15 self-assigned this Apr 26, 2022
@shajoezhu shajoezhu added the sme label Apr 29, 2022
@nikolas-burkoff
Copy link
Contributor

See: internal_github_url/NEST/random.cdisc.data/issues/242 - I suspect there are a lot of places which could be improved

In the past users used rcd directly calling radsl etc. - now we don't release rcd to users (and only use it to create a snapshot to be saved in scda) so I guess there's less value in optimizing this than there was at the time the issue was created

@gogonzo
Copy link
Contributor

gogonzo commented May 23, 2022

@shajoezhu does it matter for you guys? We don't use rcd at all, I'd close it it it was for us. NEST users should switch to scda instead.

@shajoezhu
Copy link
Contributor

Thanks @gogonzo , we will put this back into the backlog, I agree we are using scda data most of time for our NEST package development, I remember discussion that teams were using these functions to create large fake data for stress testing tasks. let's keep this open please. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sme
Projects
None yet
Development

No branches or pull requests

5 participants