-
Reduce memory requirement for encoding genotypes with large sample sizes
-
Transpose default chunk sizes to 1000 variants and 10,000 samples (issue:300)
-
Add chunksize options to mkschema (issue:294)
Breaking changes
- ICF metadata format version bumped to ensure long-term compatility between numpy 1.26.x and numpy >= 2. Existing ICFs will need to be recreated.
Maintenance release:
- Pin numpy to < 2
- Pin Zarr to < 3
- Initial production-ready version.
- Add -Q/--no-progress flag to CLI
- Change num-partitions argument in dexplode-init and dencode-init to a named option.
- Change output format of dexplode-init and dencode-init
- Bugfix for mac progress, and change of multiprocessing startup strategy.
- Change on-disk format for explode and schema
- Support older tabix indexes
- Fix some bugs in explode
- Change on-disk format of distributed encode and simplify
- Check for all partitions nominally completed encoding before doing anything destructive in dencode-finalise
- Only use NOSHUFFLE by default on
call_genotype
and bool arrays. - Add initial implementation of distributed encode
- Fix bug in schema handling (compressor settings ignored)
- Move making ICF field partition directories into per-partition processing. Remove progress on the init mkdirs step.
- Turn off progress monitor on dexplode-partition
- Fix empty partition bug
- Fix bug in --max-memory handling, and argument to a string like 10G
- Add compressor choice in explode, switch default to zstd
- Run mkdirs in parallel and provide progress
- Change dimension separator to "/" in Zarr
- Update min Zarr version to 2.17
- Various refinements to the CLI
- Merged 1D and 2D encode steps into one, and change rate reporting to bytes
- Add --max-memory for encode
- Change
chunk_width
tosamples_chunk_size
andchunk_length
tovariants_chunk_size
- Various updates to the intermediate chunked format, with breaking change to version 0.2
- Add distributed explode commands