Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(deid): drop valueString instead of passing it through #364

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

mikix
Copy link
Contributor

@mikix mikix commented Nov 19, 2024

Previously, we would let Observation.valueString (and its component form) through, optionally masking it with philter if you passed --philter in.

But it held PHI too often, and if we ever will use the field, it will likely be via ETL-side NLP. We don't need to keep the string intact into Athena.

So now, we unconditionally drop Observation.valueString, replacing it with a data-absent-reason extension marker (so consumers know it was present).

Checklist

  • Consider if documentation (like in docs/) needs to be updated
  • Consider if tests should be added

Previously, we would let Observation.valueString (and its component
form) through, optionally masking it with philter if you passed
--philter in.

But it held PHI too often, and if we ever will use the field,
it will likely be via ETL-side NLP. We don't need to keep the string
intact into Athena.

So now, we unconditionally drop Observation.valueString, replacing it
with a data-absent-reason extension marker (so consumers know it
_was_ present).
@mikix mikix marked this pull request as ready for review November 19, 2024 21:06
key == "valueString",
)
):
raise MaskValue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm sure this is correct, but just for my own edification, where is this caught?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair question 😄

The loop of the scrubber is that it walks the JSON tree, and runs all these little scrubbing handlers that if they recognize something they should scrub, modify the value or raise an exception like this.

So the answer is: one layer up in the Scrubber code. See

except MaskValue:

Copy link

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
3609 3548 98% 98% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
cumulus_etl/deid/scrubber.py 100% 🟢
TOTAL 100% 🟢

updated for commit: 741fb34 by action🐍

@mikix mikix merged commit 3e6a17e into main Nov 20, 2024
3 checks passed
@mikix mikix deleted the mikix/mask-valuestring branch November 20, 2024 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants