feat: report download progress · ImagingDataCommons/idc-index@1d607e8

Commit

feat: report download progress

there is no way to track download sequentially
as s5cmd run or cp command will be locked on a thread.
so, download progress is outsourced to a thread
and will run simultaneously along side download
thread on both series and manifest downloader

as the index only contains aws urls, when a manifest
contains gcs urls, crdc series instance uuid is extracted
from aws urls and queried against the aws urls in the index
to get download size. For manifest download, download size
is calculated first and the progress is tracked
against as a whole.

manifest validator in download from manifest is
offloaded to a dedicated function that will check
not only the first line but every line, and if the
manifest has urls from both gcp and aws to raise
an exception.

s5cmd cp is replaced with sync to gracefully avoid downloading
the same data again.

as the index only contains aws urls, when a manifest
contains gcs urls, crdc series instance uuid is extracted
from aws urls and queried against the aws urls in the index
to get download size

get functions will now return a message that data not
found for the values given for a key.

queries folder is now removed as they will persist in
idc-index-data

Loading branch information

vkt1414 committed Mar 24, 2024

1 parent 1868f8a commit 1d607e8

0 comments on commit `1d607e8`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `1d607e8`

Commit

There are no files selected for viewing

0 comments on commit 1d607e8

0 comments on commit `1d607e8`