ci: add full stack NLP regression test #295

mikix · 2024-01-10T21:23:11Z

This is overdue. We want to make sure that any NLP changes are not suprises.

Checklist

Consider if documentation (like in docs/) needs to be updated
Consider if tests should be added

mikix · 2024-01-11T20:49:48Z

.github/workflows/ci.yaml

+          tags: smartonfhir/cumulus-etl:latest
+
+      - name: Download NLP images
+        run: docker compose --profile covid-symptom up -d --quiet-pull


It's not clear how to best cache the pulled images...? Do you think GitHub does some network-side caching? There are some rando actions in the github action marketplace that promise to cache pulled images... But it might be easy enough to do on our own too.

On the other hand... it's just GitHub's resources we're using - which makes me suspect they do some their-side caching, just to save their own money and maybe we don't have to be clever.

I would assume they've got something akin to artifactory instance(s) inside their infrastructure. Since it's open source and we don't pay for time, I don't think we need to worry about optimizing - but if we were, it would probably be something similar - add a cache box and update docker hosts to point at it?

mikix · 2024-01-11T20:50:11Z

.github/workflows/ci.yaml

-        uses: actions/checkout@v3
+        uses: actions/checkout@v4


I updated a lot of actions here, but nothing interesting - just node 20 upgrades.

mikix · 2024-01-11T20:50:41Z

cumulus_etl/etl/studies/covid_symptom/covid_ctakes.py

-    matches = list(filter(is_covid_match, matches))
+    matches = filter(is_covid_match, matches)
+
+    # For better reliability when regression/unit testing, sort matches by begin / first code.
+    # (With stable sorting, we want the primary sort to be done last.)
+    matches = sorted(matches, key=lambda x: x.conceptAttributes and x.conceptAttributes[0].code)
+    matches = sorted(matches, key=lambda x: x.begin)


This is the only actual code change. Just sorting the ctakes results so we can safely compare over time.

This is overdue. We want to make sure that any NLP changes are not suprises. Questions that we now have PR-time answers for: - Does our Dockerfile build? (was only checked after merge before) - Does our built Docker work even a little bit? - Do the current NLP dependent images work even a little bit? - Are there unexpected regressions in our NLP pipeline? There might still be errors that could creep up in our NLP that this quick smoketest don't uncover. But it's a lot better than nothing!

dogversioning · 2024-01-12T14:14:18Z

.github/workflows/ci.yaml

+          tags: smartonfhir/cumulus-etl:latest
+
+      - name: Download NLP images
+        run: docker compose --profile covid-symptom up -d --quiet-pull


I would assume they've got something akin to artifactory instance(s) inside their infrastructure. Since it's open source and we don't pay for time, I don't think we need to worry about optimizing - but if we were, it would probably be something similar - add a cache box and update docker hosts to point at it?

dogversioning · 2024-01-12T14:15:06Z

.github/workflows/ci.yaml

+          sed -i 's/"generated_on": "[^"]*", //g' $OUTDIR/*.ndjson
+          diff -upr $DATADIR/expected-output $OUTDIR
+
+          echo "All Good!"


no emojis? i am SHOCKED

I thought about it! 😄

mikix force-pushed the mikix/nlp-regression branch 9 times, most recently from 6af91ae to 0c47a2a Compare January 11, 2024 20:44

mikix commented Jan 11, 2024

View reviewed changes

mikix force-pushed the mikix/nlp-regression branch from 0c47a2a to b143b5d Compare January 11, 2024 20:51

mikix changed the title ~~WIP: ci: add full stack NLP regression test~~ ci: add full stack NLP regression test Jan 11, 2024

mikix marked this pull request as ready for review January 11, 2024 20:51

dogversioning approved these changes Jan 12, 2024

View reviewed changes

mikix merged commit 0d0db81 into main Jan 12, 2024
3 checks passed

mikix deleted the mikix/nlp-regression branch January 12, 2024 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add full stack NLP regression test #295

ci: add full stack NLP regression test #295

mikix commented Jan 10, 2024 •

edited

Loading

mikix Jan 11, 2024 •

edited

Loading

dogversioning Jan 12, 2024

mikix Jan 11, 2024

mikix Jan 11, 2024

dogversioning Jan 12, 2024

dogversioning Jan 12, 2024

mikix Jan 12, 2024

ci: add full stack NLP regression test #295

ci: add full stack NLP regression test #295

Conversation

mikix commented Jan 10, 2024 • edited Loading

Checklist

mikix Jan 11, 2024 • edited Loading

Choose a reason for hiding this comment

dogversioning Jan 12, 2024

Choose a reason for hiding this comment

mikix Jan 11, 2024

Choose a reason for hiding this comment

mikix Jan 11, 2024

Choose a reason for hiding this comment

dogversioning Jan 12, 2024

Choose a reason for hiding this comment

dogversioning Jan 12, 2024

Choose a reason for hiding this comment

mikix Jan 12, 2024

Choose a reason for hiding this comment

mikix commented Jan 10, 2024 •

edited

Loading

mikix Jan 11, 2024 •

edited

Loading