Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: add full stack NLP regression test #295

Merged
merged 1 commit into from
Jan 12, 2024
Merged

ci: add full stack NLP regression test #295

merged 1 commit into from
Jan 12, 2024

Conversation

mikix
Copy link
Contributor

@mikix mikix commented Jan 10, 2024

This is overdue. We want to make sure that any NLP changes are not suprises.

Checklist

  • Consider if documentation (like in docs/) needs to be updated
  • Consider if tests should be added

@mikix mikix force-pushed the mikix/nlp-regression branch 9 times, most recently from 6af91ae to 0c47a2a Compare January 11, 2024 20:44
tags: smartonfhir/cumulus-etl:latest

- name: Download NLP images
run: docker compose --profile covid-symptom up -d --quiet-pull
Copy link
Contributor Author

@mikix mikix Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear how to best cache the pulled images...? Do you think GitHub does some network-side caching? There are some rando actions in the github action marketplace that promise to cache pulled images... But it might be easy enough to do on our own too.

On the other hand... it's just GitHub's resources we're using - which makes me suspect they do some their-side caching, just to save their own money and maybe we don't have to be clever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume they've got something akin to artifactory instance(s) inside their infrastructure. Since it's open source and we don't pay for time, I don't think we need to worry about optimizing - but if we were, it would probably be something similar - add a cache box and update docker hosts to point at it?

uses: actions/checkout@v3
uses: actions/checkout@v4
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated a lot of actions here, but nothing interesting - just node 20 upgrades.

Comment on lines -76 to +81
matches = list(filter(is_covid_match, matches))
matches = filter(is_covid_match, matches)

# For better reliability when regression/unit testing, sort matches by begin / first code.
# (With stable sorting, we want the primary sort to be done last.)
matches = sorted(matches, key=lambda x: x.conceptAttributes and x.conceptAttributes[0].code)
matches = sorted(matches, key=lambda x: x.begin)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only actual code change. Just sorting the ctakes results so we can safely compare over time.

This is overdue. We want to make sure that any NLP changes are not
suprises.

Questions that we now have PR-time answers for:
- Does our Dockerfile build? (was only checked after merge before)
- Does our built Docker work even a little bit?
- Do the current NLP dependent images work even a little bit?
- Are there unexpected regressions in our NLP pipeline?

There might still be errors that could creep up in our NLP that this
quick smoketest don't uncover. But it's a lot better than nothing!
@mikix mikix force-pushed the mikix/nlp-regression branch from 0c47a2a to b143b5d Compare January 11, 2024 20:51
@mikix mikix changed the title WIP: ci: add full stack NLP regression test ci: add full stack NLP regression test Jan 11, 2024
@mikix mikix marked this pull request as ready for review January 11, 2024 20:51
tags: smartonfhir/cumulus-etl:latest

- name: Download NLP images
run: docker compose --profile covid-symptom up -d --quiet-pull
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume they've got something akin to artifactory instance(s) inside their infrastructure. Since it's open source and we don't pay for time, I don't think we need to worry about optimizing - but if we were, it would probably be something similar - add a cache box and update docker hosts to point at it?

sed -i 's/"generated_on": "[^"]*", //g' $OUTDIR/*.ndjson
diff -upr $DATADIR/expected-output $OUTDIR

echo "All Good!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no emojis? i am SHOCKED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it! 😄

@mikix mikix merged commit 0d0db81 into main Jan 12, 2024
3 checks passed
@mikix mikix deleted the mikix/nlp-regression branch January 12, 2024 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants