-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add inlining options and a new inline
command
#374
base: main
Are you sure you want to change the base?
Conversation
☂️ Python Coverage
Overall Coverage
New Files
Modified Files
|
2882da0
to
cce70c5
Compare
@@ -86,7 +90,15 @@ async def __aexit__(self, exc_type, exc_value, traceback): | |||
await self._session.aclose() | |||
|
|||
async def request( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me lightly explain some of the refactoring here.
Before, FhirClient.request was a single request. The retry logic was in the bulk export code, which retried on both error cases and the expected 202
status code for "still working on it".
Now, I've separated the bulk export retry logic into error-path and 202-path, and moved the error-path into this FhirClient.request method. So now any code making requests to the server has easy retries (enabled by default for any request).
Inlining here means downloading attachment URLs and insert them as attachment data, allowing you to download the attachment just once, at the cost of extra storage space. This is often useful in preparation for NLP tasks, where you want to often refer back to DocumentReference clinical notes, but do not want to constantly deal with the flakiness of an EHR server. (Or losing access to that server.) New features: - There is a new `inline` command that can inline an existing folder of NDJSON. - The `export` and `etl` commands both now accept an opt-in flag to inline data after performing a bulk export. - If the server provides a Retry-After header that is a timestamp (rather than a number of seconds), we now parse that correctly. - All server requests are retried at least a little bit upon errors. (previously, only bulk export requests were retried - now every time we hit the server, so even for Medication downloads or DocRef attachment downloads during NLP tasks. Behavior changes: - If the server gives us a 429 error, we now log it as an error message, and don't log a successful progress call. - If the server gives us a 429 error, we use the next exponential backoff delay instead of hard coding 60 seconds as the delay. - If the server gives us a Retry-After header on an error message, we no longer unconditionally accept and use it. Rather the requested delay is capped by our next exponential backoff delay. That is, the server's Retry-After time will be used if it's LESS than our next backoff, but if it's longer, we'll still use our own backoff. This is lightly hostile, but (a) it's only done on error cases, (b) our backoff times are generous, and counted in minutes not seconds, and (c) it lets us guarantee a max delay time for callers. - Instead of retrying on 429 and ALL 5xx errors, there's a specific list of error codes that we retry on. Currently it's 408, 429, 500, 502, 503, and 504. - Have the bulk exporter time out after 30 days, rather than one. We've seen Epic exports take a couple weeks.
Inlining here means downloading attachment URLs and insert them as attachment data, allowing you to download the attachment just once, at the cost of extra storage space.
This is often useful in preparation for NLP tasks, where you want to often refer back to DocumentReference clinical notes, but do not want to constantly deal with the flakiness of an EHR server. (Or losing access to that server.)
New features:
inline
command that can inline an existing folder of NDJSON.export
andetl
commands both now accept an opt-in flag to inline data after performing a bulk export.Behavior changes:
Checklist
docs/
) needs to be updated