Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Important: core__encounter frequently missing START date and rarely missing END date #330

Open
comorbidity opened this issue Dec 31, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@comorbidity
Copy link
Contributor

select count(distinct encounter_ref) from core__encounter where period_start_day is null
2,147,614

Thats 10% out of 20M visits (20,210,548) which is A LOT

Workarounds:
(A) Do nothing.
Researchers may be very surprised that 10% visits are missing, and might have preferred ANY encounter date to be present.

(B) Set START date = END date which is almost always present.
For all Encounter Class that are not IMP or OBSENC, this is a reasonable assumption.

This would settle ALL but 9,582 encounters, which is a dramatic improvemnt.
class_code cnt
AMB 2137359
IMP 9582
NULL 588
EMER 85

(C) something else.

@comorbidity vote is for (B) as it fixes nearly all scenarios.

@comorbidity comorbidity added the bug Something isn't working label Dec 31, 2024
@James-R-Jones
Copy link
Contributor

May be due to other date resolutions being used instead of day? Any idea if these are category = documents or History? Cerner treats those oddly.

@dogversioning
Copy link
Contributor

So I think we are at risk of designing something to just handle our legacy cerner dataset here, which is likely to cause problems down the line

This is somewhat related to #205 - what would be more reliable long term is a set of 'what do we do when dates are missing in way X', and it would be desirable to do try and do this in a holistic way that scales rather than targeting a specific field

So if I could get a requirements writeup, on a per resource basis:

  • which dates are substitutable in the context of a resource? (example: if period is missing in encounter, can/should we substitute participant.period?)
  • For periods, do we always attempt to populate a null value in start/end if the other is missing?
  • If multiple date types are allowed in a field (think lab observations), should we attempt to be clever at inspecting all of them?

With this we can at document expectations, which would help to head off certain kinds of questions (i.e. quality metrics failed on a bunch of dates, but I still see the dates in the core tables, what gives?)

@dogversioning
Copy link
Contributor

dogversioning commented Jan 2, 2025

While we're here - it might also be worth talking about our date bloat. we might be able to slim down some tables by including the highest resolution field, and having helpers for getting dates of a certain type.

@mikix
Copy link
Contributor

mikix commented Jan 6, 2025

I think there might be value in adding a new "smoothed over" version of a date field when we do this, and leave the original odd period data in the core table too. Because a field with just an end date but no start date is saying something different than a field with the same value for both (i.e. the difference between "start date unknown" and "over in an instant").

For many use cases, maybe that doesn't matter. But the correctness-bug in my soul worries about entirely papering over that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants