Skip to content

Commit

Permalink
feat: report download progress
Browse files Browse the repository at this point in the history
there is no way to track download sequentially
as s5cmd run or cp command will be locked on a thread.
so, download progress is outsourced to a thread
and will run simultaneously along side download
thread on both series and manifest downloader

as the index only contains aws urls, when a manifest
contains gcs urls, crdc series instance uuid is extracted
from aws urls and queried against the aws urls in the index
to get download size. For manifest download, download size
is calculated first and the progress is tracked
against as a whole.

manifest validator in download from manifest is
offloaded to a dedicated function that will check
not only the first line but every line, and if the
manifest has urls from both gcp and aws to raise
an exception.

s5cmd cp is replaced with sync to gracefully avoid downloading
the same data again.

as the index only contains aws urls, when a manifest
contains gcs urls, crdc series instance uuid is extracted
from aws urls and queried against the aws urls in the index
to get download size

get functions will now return a message that data not
found for the values given for a key.

queries folder is now removed as they will persist in
idc-index-data
  • Loading branch information
vkt1414 committed Mar 24, 2024
1 parent 1868f8a commit 1d607e8
Show file tree
Hide file tree
Showing 2 changed files with 304 additions and 166 deletions.
Loading

0 comments on commit 1d607e8

Please sign in to comment.