-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Important: core__encounter frequently missing START date and rarely missing END date #330
Comments
May be due to other date resolutions being used instead of day? Any idea if these are category = documents or History? Cerner treats those oddly. |
So I think we are at risk of designing something to just handle our legacy cerner dataset here, which is likely to cause problems down the line This is somewhat related to #205 - what would be more reliable long term is a set of 'what do we do when dates are missing in way X', and it would be desirable to do try and do this in a holistic way that scales rather than targeting a specific field So if I could get a requirements writeup, on a per resource basis:
With this we can at document expectations, which would help to head off certain kinds of questions (i.e. quality metrics failed on a bunch of dates, but I still see the dates in the core tables, what gives?) |
While we're here - it might also be worth talking about our date bloat. we might be able to slim down some tables by including the highest resolution field, and having helpers for getting dates of a certain type. |
I think there might be value in adding a new "smoothed over" version of a date field when we do this, and leave the original odd period data in the core table too. Because a field with just an end date but no start date is saying something different than a field with the same value for both (i.e. the difference between "start date unknown" and "over in an instant"). For many use cases, maybe that doesn't matter. But the correctness-bug in my soul worries about entirely papering over that. |
select count(distinct encounter_ref) from core__encounter where period_start_day is null
2,147,614
Thats 10% out of 20M visits (20,210,548) which is A LOT
Workarounds:
(A) Do nothing.
Researchers may be very surprised that 10% visits are missing, and might have preferred ANY encounter date to be present.
(B) Set START date = END date which is almost always present.
For all Encounter Class that are not IMP or OBSENC, this is a reasonable assumption.
This would settle ALL but 9,582 encounters, which is a dramatic improvemnt.
class_code cnt
AMB 2137359
IMP 9582
NULL 588
EMER 85
(C) something else.
@comorbidity vote is for (B) as it fixes nearly all scenarios.
The text was updated successfully, but these errors were encountered: