Skip to content

cildataconverter.py

Chris Churas edited this page Jan 25, 2018 · 3 revisions

This script takes a directory of downloaded image and video datasets created by cildatadownloader.py and converts the data files as defined in the requirements document.

Output when passing --help on command line:

usage: cildataconverter.py [-h] [--log {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                           [--id ID] [--onlycheckzipfiles]
                           [--skipifrawmissing] [--version]
                           downloaddir

              Version 0.1.0

              Given a directory of images and videos downloaded by
              cildatadownloader.py, this script performs a set of
              file renames and conversions on each dataset found within.

              The only required argument is the output directory
              generated by cildatadownloader.py which should contain
              two subdirectories images/ and videos/.

              IMAGES:

              For datasets under images/ directory the following
              conversions are performed:

              <ID>.raw is examined and verified to be a valid zip file.

              Any files found in <ID>.raw are extracted and written
              to <ID> directory in format <ID>_orig.<FORMAT> where
              <FORMAT> is the extension found in the <ID>.zip file.

              <ID>.zip is created with the following structure:
                 <ID>/
                      <ID>_orig.<FORMAT>
                      <ID>_orig.<FORMAT>

              If the zip file looks to exceed 4gb 64bit extensions
              are enabled.

              The json file is updated to have correct md5 checksums,
              file sizes, and mime_types as determined by headers
              when downloading and/or by file extension.

              VIDEOS:

              For datasets under videos/ directory the following
              conversions are performed:

              <ID>.raw is renamed based on its mimetype obtained
              from http headers when downloaded and named
              <ID>.<MIMETYPE SUFFIX>.

              <ID>.zip file is created with following structure
                 <ID>/
                      <ID>.<MIMETYPE SUFFIX>

              The json file is updated to have correct md5 checksums,
              file sizes, and mime_types as determined by headers
              when downloading and/or by file extension.

              For more information visit:

              https://github.com/slash-segmentation/cildata_util/wiki
    

positional arguments:
  downloaddir           Directory where images and videos reside

optional arguments:
  -h, --help            show this help message and exit
  --log {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Set the logging level (default WARNING)
  --id ID               Only convert data with id passed in.
  --onlycheckzipfiles   If set examines all the zip files in imagesand reports
                        number of files and file namesfound.
  --skipifrawmissing    If set skip any ids where no raw file is foundin
                        directory.
  --version             show program's version number and exit
Clone this wiki locally