From b32485975421dc3f22c6e94699c4a3df59d33f54 Mon Sep 17 00:00:00 2001 From: Jerome Kelleher Date: Tue, 14 May 2024 12:54:11 +0100 Subject: [PATCH 1/4] Document non-windows support. Closes #169 --- docs/installation.md | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/docs/installation.md b/docs/installation.md index 52f7155..a1aaac2 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -1,31 +1,38 @@ # Installation -``` -$ python3 -m pip install bio2zarr +```bash +python3 -m pip install bio2zarr ``` -This will install the programs ``vcf2zarr``, ``plink2zarr`` and ``vcf_partition`` +This will install the programs ``vcf2zarr`` and ``vcf_partition`` into your local Python path. You may need to update your $PATH to call the executables directly. Alternatively, calling -``` -$ python3 -m bio2zarr vcf2zarr +```bash +python3 -m bio2zarr vcf2zarr ``` is equivalent to -``` -$ vcf2zarr +```bash +vcf2zarr ``` and will always work. +:::{warning} +Windows is not currently supported. Please comment on +[this issue](https://github.com/sgkit-dev/bio2zarr/issues/174) if you would +like to see Windows support for bio2zarr. +::: + + ## Shell completion To enable shell completion for a particular session in Bash do: -``` +```bash eval "$(_VCF2ZARR_COMPLETE=bash_source vcf2zarr)" ``` From 77a7579770d26c2f1306b31304d453e03463b676 Mon Sep 17 00:00:00 2001 From: Jerome Kelleher Date: Tue, 14 May 2024 13:33:42 +0100 Subject: [PATCH 2/4] Update top-level docs to give a basic idea of status --- docs/intro.md | 20 ++++++++++++++++++-- docs/vcf2zarr/overview.md | 5 ++++- docs/vcfpartition/overview.md | 1 + 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/docs/intro.md b/docs/intro.md index 07f11c2..068e9c1 100644 --- a/docs/intro.md +++ b/docs/intro.md @@ -1,9 +1,25 @@ # bio2zarr `bio2zarr` efficiently converts common bioinformatics formats to -[Zarr](https://zarr.readthedocs.io/en/stable/) format. Initially supporting converting -VCF to the [VCF Zarr specification](https://github.com/sgkit-dev/vcf-zarr-spec/). +[Zarr](https://zarr.readthedocs.io/en/stable/) format. + +## Tools + +- {ref}`sec-vcf2zarr` converts VCF data to + [VCF Zarr](https://github.com/sgkit-dev/vcf-zarr-spec/) format. + +- {ref}`sec-vcfpartition` is a utility to split an input (set of) + VCFs into a given number of partitions. This is useful for + parallel processing. + +## Development status `bio2zarr` is in development, contributions, feedback and issues are welcome at the [GitHub repository](https://github.com/sgkit-dev/bio2zarr). +Support for converting PLINK data to VCF Zarr is partially implemented, +and adding BGEN support is also planned. If you would like to see +support for other formats (or an interested in helping with implementing), +please open an [issue on Github](https://github.com/sgkit-dev/bio2zarr/issues) +to discuss! + diff --git a/docs/vcf2zarr/overview.md b/docs/vcf2zarr/overview.md index 16234ae..8a50a6c 100644 --- a/docs/vcf2zarr/overview.md +++ b/docs/vcf2zarr/overview.md @@ -1,7 +1,10 @@ +(sec-vcf2zarr)= # vcf2zarr -Convert a VCF to zarr format: +Convert VCF data to the +[VCF Zarr specification](https://github.com/sgkit-dev/vcf-zarr-spec/) +reliably, in parallel. ``` $ vcf2zarr convert diff --git a/docs/vcfpartition/overview.md b/docs/vcfpartition/overview.md index 144ff4c..6d9dbad 100644 --- a/docs/vcfpartition/overview.md +++ b/docs/vcfpartition/overview.md @@ -1,3 +1,4 @@ +(sec-vcfpartition)= # vcfpartition ## Overview From 7dbbaad530676d56b4ba3974a0da6fd7e9a1dfda Mon Sep 17 00:00:00 2001 From: Jerome Kelleher Date: Tue, 14 May 2024 13:59:30 +0100 Subject: [PATCH 3/4] Various documentation updates --- docs/installation.md | 7 +++++++ docs/intro.md | 2 ++ docs/vcf2zarr/cli_ref.md | 2 ++ docs/vcf2zarr/overview.md | 33 +++++++++++++++++++++++++++++++-- docs/vcf2zarr/tutorial.md | 1 + 5 files changed, 43 insertions(+), 2 deletions(-) diff --git a/docs/installation.md b/docs/installation.md index a1aaac2..c5d975c 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -1,3 +1,4 @@ +(sec-installation)= # Installation @@ -20,6 +21,12 @@ vcf2zarr ``` and will always work. +:::{note} +The ``python3 -m bio2zarr vcf2zarr`` for may be replaced with +``python3 -m bio2zarr.vcf2zarr`` in the near future. +See GitHub issue [203](https://github.com/sgkit-dev/bio2zarr/issues/203). +::: + :::{warning} Windows is not currently supported. Please comment on diff --git a/docs/intro.md b/docs/intro.md index 068e9c1..c81efcf 100644 --- a/docs/intro.md +++ b/docs/intro.md @@ -23,3 +23,5 @@ support for other formats (or an interested in helping with implementing), please open an [issue on Github](https://github.com/sgkit-dev/bio2zarr/issues) to discuss! +The package is currently focused on command line interfaces, but a +Python API is also planned. diff --git a/docs/vcf2zarr/cli_ref.md b/docs/vcf2zarr/cli_ref.md index 0d31e8c..91f2391 100644 --- a/docs/vcf2zarr/cli_ref.md +++ b/docs/vcf2zarr/cli_ref.md @@ -1,3 +1,4 @@ +(sec-vcf2zarr-cli-ref)= # CLI Reference % A note on cross references... There's some weird long-standing problem with @@ -57,6 +58,7 @@ ## Encode ```{eval-rst} +.. _cmd-vcf2zarr-encode: .. click:: bio2zarr.cli:encode :prog: vcf2zarr encode :nested: full diff --git a/docs/vcf2zarr/overview.md b/docs/vcf2zarr/overview.md index 8a50a6c..619f8d4 100644 --- a/docs/vcf2zarr/overview.md +++ b/docs/vcf2zarr/overview.md @@ -1,10 +1,39 @@ (sec-vcf2zarr)= # vcf2zarr - Convert VCF data to the [VCF Zarr specification](https://github.com/sgkit-dev/vcf-zarr-spec/) -reliably, in parallel. +reliably, in parallel or distributed over a cluster. + +See the {ref}`sec-vcf2zarr-tutorial` for a step-by-step introduction +and the {ref}`sec-vcf2zarr-cli-ref` detailed documentation on +command line options. + + +## Quickstart + +First {ref}`install bio2zarr` + + +## How does it work? +The conversion of VCF data to Zarr is a two-step process: + +1. Convert ({ref}`explode`) VCF file(s) to + Intermediate Columnar Format (ICF) +2. Convert ({ref}`encode`) ICF to Zarr + +This two-step process allows `vcf2zarr` to determine the correct +dimension of Zarr arrays corresponding to each VCF field, and +to keep memory usage tightly bounded while writing the arrays. + +:::{important} +The intermediate columnar format is not intended for any use +other than a temporary storage while converting VCF to Zarr. +The format may change between versions of `bio2zarr`. +::: + + +## Common options ``` $ vcf2zarr convert diff --git a/docs/vcf2zarr/tutorial.md b/docs/vcf2zarr/tutorial.md index 8626899..223bb15 100644 --- a/docs/vcf2zarr/tutorial.md +++ b/docs/vcf2zarr/tutorial.md @@ -9,6 +9,7 @@ kernelspec: language: bash name: bash --- +(sec-vcf2zarr-tutorial)= # Tutorial This is a step-by-step tutorial showing you how to convert your From 8106d400ee26f2f7db1aea2eddad2e633ff645af Mon Sep 17 00:00:00 2001 From: Jerome Kelleher Date: Tue, 14 May 2024 15:56:17 +0100 Subject: [PATCH 4/4] Some refinements to overview --- docs/vcf2zarr/overview.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/vcf2zarr/overview.md b/docs/vcf2zarr/overview.md index 619f8d4..5ede464 100644 --- a/docs/vcf2zarr/overview.md +++ b/docs/vcf2zarr/overview.md @@ -12,7 +12,12 @@ command line options. ## Quickstart -First {ref}`install bio2zarr` +First {ref}`install bio2zarr`. + + +:::{note} +FINISH ME +::: ## How does it work?