-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: add full stack NLP regression test #295
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,7 +23,7 @@ jobs: | |
python-version: ["3.10"] | ||
|
||
steps: | ||
- uses: actions/checkout@v3 | ||
- uses: actions/checkout@v4 | ||
|
||
- name: Set up Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v4 | ||
|
@@ -37,7 +37,7 @@ jobs: | |
pip install .[tests] | ||
|
||
- name: Check out MS tool | ||
uses: actions/checkout@v3 | ||
uses: actions/checkout@v4 | ||
with: | ||
repository: microsoft/Tools-for-Health-Data-Anonymization | ||
path: mstool | ||
|
@@ -57,10 +57,51 @@ jobs: | |
run: | | ||
python -m pytest | ||
|
||
nlp-regression: | ||
runs-on: ubuntu-latest | ||
env: | ||
UMLS_API_KEY: ${{ secrets.UMLS_API_KEY }} | ||
steps: | ||
- uses: actions/checkout@v4 | ||
|
||
- name: Install Docker | ||
uses: docker/setup-buildx-action@v3 | ||
|
||
- name: Build ETL image | ||
uses: docker/build-push-action@v5 | ||
with: | ||
load: true # just build, no push | ||
tags: smartonfhir/cumulus-etl:latest | ||
|
||
- name: Download NLP images | ||
run: docker compose --profile covid-symptom up -d --quiet-pull | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not clear how to best cache the pulled images...? Do you think GitHub does some network-side caching? There are some rando actions in the github action marketplace that promise to cache pulled images... But it might be easy enough to do on our own too. On the other hand... it's just GitHub's resources we're using - which makes me suspect they do some their-side caching, just to save their own money and maybe we don't have to be clever. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would assume they've got something akin to artifactory instance(s) inside their infrastructure. Since it's open source and we don't pay for time, I don't think we need to worry about optimizing - but if we were, it would probably be something similar - add a cache box and update docker hosts to point at it? |
||
|
||
- name: Run NLP | ||
run: | | ||
export DATADIR=$(realpath tests/data/nlp-regression) | ||
|
||
# Run the NLP task | ||
docker compose run --rm \ | ||
--volume $DATADIR:/in \ | ||
cumulus-etl \ | ||
/in/input \ | ||
/in/run-output \ | ||
/in/phi \ | ||
--output-format=ndjson \ | ||
--task covid_symptom__nlp_results | ||
|
||
# Compare results | ||
export OUTDIR=$DATADIR/run-output/covid_symptom__nlp_results | ||
sudo chown -R $(id -u) $OUTDIR | ||
sed -i 's/"generated_on": "[^"]*", //g' $OUTDIR/*.ndjson | ||
diff -upr $DATADIR/expected-output $OUTDIR | ||
|
||
echo "All Good!" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no emojis? i am SHOCKED There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought about it! 😄 |
||
|
||
lint: | ||
runs-on: ubuntu-22.04 | ||
steps: | ||
- uses: actions/checkout@v3 | ||
- uses: actions/checkout@v4 | ||
|
||
- name: Install linters | ||
# black is synced with the .pre-commit-hooks version | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -73,7 +73,12 @@ async def covid_symptoms_extract( | |
def is_covid_match(m: ctakesclient.typesystem.MatchText): | ||
return bool(covid_symptom_cuis.intersection({attr.cui for attr in m.conceptAttributes})) | ||
|
||
matches = list(filter(is_covid_match, matches)) | ||
matches = filter(is_covid_match, matches) | ||
|
||
# For better reliability when regression/unit testing, sort matches by begin / first code. | ||
# (With stable sorting, we want the primary sort to be done last.) | ||
matches = sorted(matches, key=lambda x: x.conceptAttributes and x.conceptAttributes[0].code) | ||
matches = sorted(matches, key=lambda x: x.begin) | ||
Comment on lines
-76
to
+81
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the only actual code change. Just sorting the ctakes results so we can safely compare over time. |
||
|
||
# OK we have cTAKES symptoms. But let's also filter through cNLP transformers to remove any that are negated | ||
# there too. We have found this to yield better results than cTAKES alone. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
{ | ||
"groups": [ | ||
"032b2ff6af8c883760d5a44e32ff80454d69551de6438c46be64604ddc744156", | ||
"05d0686aec0a65069a1e5b1a4937f5196b75ae336b7fbe10300882184523f95e", | ||
"13e748c21a7c50f6c59fc4613683cd5d7f76bd5d68fda20f4e81ccce74ea7930", | ||
"364aa545eca0a9744bc67c5ad914e2e9e35dd39a5c1f1a8f902e533a8641238d", | ||
"36ecd07bc327bba4e5ea36e34e66ca7f4f54360aef5bbcafc745c9f144aa87f8" | ||
] | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{"id": "032b2ff6af8c883760d5a44e32ff80454d69551de6438c46be64604ddc744156.0", "docref_id": "032b2ff6af8c883760d5a44e32ff80454d69551de6438c46be64604ddc744156", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4, "match": {"begin": 608, "end": 615, "text": "fatigue", "polarity": 0, "conceptAttributes": [{"code": "248274002", "cui": "C0015672", "codingScheme": "SNOMEDCT_US", "tui": "T184"}, {"code": "84229001", "cui": "C0015672", "codingScheme": "SNOMEDCT_US", "tui": "T184"}], "type": "SignSymptomMention"}} | ||
{"id": "032b2ff6af8c883760d5a44e32ff80454d69551de6438c46be64604ddc744156.1", "docref_id": "032b2ff6af8c883760d5a44e32ff80454d69551de6438c46be64604ddc744156", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4, "match": {"begin": 608, "end": 615, "text": "fatigue", "polarity": 0, "conceptAttributes": [{"code": "n/a", "cui": "C0015672", "codingScheme": "custom", "tui": "T184"}], "type": "SignSymptomMention"}} | ||
{"id": "032b2ff6af8c883760d5a44e32ff80454d69551de6438c46be64604ddc744156.2", "docref_id": "032b2ff6af8c883760d5a44e32ff80454d69551de6438c46be64604ddc744156", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4, "match": {"begin": 812, "end": 821, "text": "headaches", "polarity": 0, "conceptAttributes": [{"code": "n/a", "cui": "C0018681", "codingScheme": "custom", "tui": "T184"}], "type": "SignSymptomMention"}} | ||
{"id": "05d0686aec0a65069a1e5b1a4937f5196b75ae336b7fbe10300882184523f95e.0", "docref_id": "05d0686aec0a65069a1e5b1a4937f5196b75ae336b7fbe10300882184523f95e", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4, "match": {"begin": 6, "end": 14, "text": "Headache", "polarity": 0, "conceptAttributes": [{"code": "25064002", "cui": "C0018681", "codingScheme": "SNOMEDCT_US", "tui": "T184"}], "type": "SignSymptomMention"}} | ||
{"id": "05d0686aec0a65069a1e5b1a4937f5196b75ae336b7fbe10300882184523f95e.1", "docref_id": "05d0686aec0a65069a1e5b1a4937f5196b75ae336b7fbe10300882184523f95e", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4, "match": {"begin": 6, "end": 14, "text": "Headache", "polarity": 0, "conceptAttributes": [{"code": "n/a", "cui": "C0018681", "codingScheme": "custom", "tui": "T184"}], "type": "SignSymptomMention"}} | ||
{"id": "05d0686aec0a65069a1e5b1a4937f5196b75ae336b7fbe10300882184523f95e.2", "docref_id": "05d0686aec0a65069a1e5b1a4937f5196b75ae336b7fbe10300882184523f95e", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4, "match": {"begin": 114, "end": 133, "text": "nausea and vomiting", "polarity": 0, "conceptAttributes": [{"code": "16932000", "cui": "C0027498", "codingScheme": "SNOMEDCT_US", "tui": "T184"}], "type": "SignSymptomMention"}} | ||
{"id": "05d0686aec0a65069a1e5b1a4937f5196b75ae336b7fbe10300882184523f95e.3", "docref_id": "05d0686aec0a65069a1e5b1a4937f5196b75ae336b7fbe10300882184523f95e", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4, "match": {"begin": 114, "end": 133, "text": "nausea and vomiting", "polarity": 0, "conceptAttributes": [{"code": "n/a", "cui": "C0027498", "codingScheme": "custom", "tui": "T184"}], "type": "SignSymptomMention"}} | ||
{"id": "05d0686aec0a65069a1e5b1a4937f5196b75ae336b7fbe10300882184523f95e.4", "docref_id": "05d0686aec0a65069a1e5b1a4937f5196b75ae336b7fbe10300882184523f95e", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4, "match": {"begin": 603, "end": 611, "text": "fatigued", "polarity": 0, "conceptAttributes": [{"code": "n/a", "cui": "C0015672", "codingScheme": "custom", "tui": "T184"}], "type": "SignSymptomMention"}} | ||
{"id": "13e748c21a7c50f6c59fc4613683cd5d7f76bd5d68fda20f4e81ccce74ea7930.2", "docref_id": "13e748c21a7c50f6c59fc4613683cd5d7f76bd5d68fda20f4e81ccce74ea7930", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4, "match": {"begin": 303, "end": 318, "text": "short of breath", "polarity": 0, "conceptAttributes": [{"code": "n/a", "cui": "C0013404", "codingScheme": "custom", "tui": "T184"}], "type": "SignSymptomMention"}} | ||
{"id": "36ecd07bc327bba4e5ea36e34e66ca7f4f54360aef5bbcafc745c9f144aa87f8.0", "docref_id": "36ecd07bc327bba4e5ea36e34e66ca7f4f54360aef5bbcafc745c9f144aa87f8", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4, "match": {"begin": 343, "end": 348, "text": "cough", "polarity": 0, "conceptAttributes": [{"code": "263731006", "cui": "C0010200", "codingScheme": "SNOMEDCT_US", "tui": "T184"}, {"code": "49727002", "cui": "C0010200", "codingScheme": "SNOMEDCT_US", "tui": "T184"}], "type": "SignSymptomMention"}} | ||
{"id": "36ecd07bc327bba4e5ea36e34e66ca7f4f54360aef5bbcafc745c9f144aa87f8.1", "docref_id": "36ecd07bc327bba4e5ea36e34e66ca7f4f54360aef5bbcafc745c9f144aa87f8", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4, "match": {"begin": 343, "end": 348, "text": "cough", "polarity": 0, "conceptAttributes": [{"code": "n/a", "cui": "C0010200", "codingScheme": "custom", "tui": "T184"}], "type": "SignSymptomMention"}} | ||
{"id": "36ecd07bc327bba4e5ea36e34e66ca7f4f54360aef5bbcafc745c9f144aa87f8.2", "docref_id": "36ecd07bc327bba4e5ea36e34e66ca7f4f54360aef5bbcafc745c9f144aa87f8", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4, "match": {"begin": 350, "end": 356, "text": "fevers", "polarity": 0, "conceptAttributes": [{"code": "n/a", "cui": "C0015967", "codingScheme": "custom", "tui": "T184"}], "type": "SignSymptomMention"}} | ||
{"id": "36ecd07bc327bba4e5ea36e34e66ca7f4f54360aef5bbcafc745c9f144aa87f8.3", "docref_id": "36ecd07bc327bba4e5ea36e34e66ca7f4f54360aef5bbcafc745c9f144aa87f8", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4, "match": {"begin": 372, "end": 378, "text": "chills", "polarity": 0, "conceptAttributes": [{"code": "n/a", "cui": "C0085593", "codingScheme": "custom", "tui": "T184"}], "type": "SignSymptomMention"}} | ||
{"id": "36ecd07bc327bba4e5ea36e34e66ca7f4f54360aef5bbcafc745c9f144aa87f8.4", "docref_id": "36ecd07bc327bba4e5ea36e34e66ca7f4f54360aef5bbcafc745c9f144aa87f8", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4, "match": {"begin": 1536, "end": 1541, "text": "fever", "polarity": 0, "conceptAttributes": [{"code": "386661006", "cui": "C0015967", "codingScheme": "SNOMEDCT_US", "tui": "T184"}, {"code": "50177009", "cui": "C0015967", "codingScheme": "SNOMEDCT_US", "tui": "T184"}], "type": "SignSymptomMention"}} | ||
{"id": "36ecd07bc327bba4e5ea36e34e66ca7f4f54360aef5bbcafc745c9f144aa87f8.5", "docref_id": "36ecd07bc327bba4e5ea36e34e66ca7f4f54360aef5bbcafc745c9f144aa87f8", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4, "match": {"begin": 1536, "end": 1541, "text": "fever", "polarity": 0, "conceptAttributes": [{"code": "n/a", "cui": "C0015967", "codingScheme": "custom", "tui": "T184"}], "type": "SignSymptomMention"}} | ||
{"id": "364aa545eca0a9744bc67c5ad914e2e9e35dd39a5c1f1a8f902e533a8641238d.0", "docref_id": "364aa545eca0a9744bc67c5ad914e2e9e35dd39a5c1f1a8f902e533a8641238d", "encounter_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "subject_id": "827db3458e3d956437c2b43f441eca441851c2f2e937e2c5467fdd0c5f980db5", "task_version": 4} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated a lot of actions here, but nothing interesting - just node 20 upgrades.