Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
there is no way to track download sequentially as s5cmd run or cp command will be locked on a thread. so, download progress is outsourced to a thread and will run simultaneously along side download thread on both series and manifest downloader as the index only contains aws urls, when a manifest contains gcs urls, crdc series instance uuid is extracted from aws urls and queried against the aws urls in the index to get download size. For manifest download, download size is calculated first and the progress is tracked against as a whole. manifest validator in download from manifest is offloaded to a dedicated function that will check not only the first line but every line, and if the manifest has urls from both gcp and aws to raise an exception. s5cmd cp is replaced with sync to gracefully avoid downloading the same data again. as the index only contains aws urls, when a manifest contains gcs urls, crdc series instance uuid is extracted from aws urls and queried against the aws urls in the index to get download size get functions will now return a message that data not found for the values given for a key. queries folder is now removed as they will persist in idc-index-data
- Loading branch information