Skip to content

Commit

Permalink
Merge pull request #204 from jeromekelleher/cli-doc-updates
Browse files Browse the repository at this point in the history
Cli doc updates
  • Loading branch information
jeromekelleher authored May 14, 2024
2 parents 9983e73 + 8106d40 commit 6dd3c7d
Show file tree
Hide file tree
Showing 6 changed files with 84 additions and 11 deletions.
30 changes: 22 additions & 8 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,45 @@
(sec-installation)=
# Installation


```
$ python3 -m pip install bio2zarr
```bash
python3 -m pip install bio2zarr
```

This will install the programs ``vcf2zarr``, ``plink2zarr`` and ``vcf_partition``
This will install the programs ``vcf2zarr`` and ``vcf_partition``
into your local Python path. You may need to update your $PATH to call the
executables directly.

Alternatively, calling
```
$ python3 -m bio2zarr vcf2zarr <args>
```bash
python3 -m bio2zarr vcf2zarr <args>
```
is equivalent to

```
$ vcf2zarr <args>
```bash
vcf2zarr <args>
```
and will always work.

:::{note}
The ``python3 -m bio2zarr vcf2zarr`` for may be replaced with
``python3 -m bio2zarr.vcf2zarr`` in the near future.
See GitHub issue [203](https://github.com/sgkit-dev/bio2zarr/issues/203).
:::


:::{warning}
Windows is not currently supported. Please comment on
[this issue](https://github.com/sgkit-dev/bio2zarr/issues/174) if you would
like to see Windows support for bio2zarr.
:::


## Shell completion

To enable shell completion for a particular session in Bash do:

```
```bash
eval "$(_VCF2ZARR_COMPLETE=bash_source vcf2zarr)"
```

Expand Down
22 changes: 20 additions & 2 deletions docs/intro.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,27 @@
# bio2zarr

`bio2zarr` efficiently converts common bioinformatics formats to
[Zarr](https://zarr.readthedocs.io/en/stable/) format. Initially supporting converting
VCF to the [VCF Zarr specification](https://github.com/sgkit-dev/vcf-zarr-spec/).
[Zarr](https://zarr.readthedocs.io/en/stable/) format.

## Tools

- {ref}`sec-vcf2zarr` converts VCF data to
[VCF Zarr](https://github.com/sgkit-dev/vcf-zarr-spec/) format.

- {ref}`sec-vcfpartition` is a utility to split an input (set of)
VCFs into a given number of partitions. This is useful for
parallel processing.

## Development status

`bio2zarr` is in development, contributions, feedback and issues are welcome
at the [GitHub repository](https://github.com/sgkit-dev/bio2zarr).

Support for converting PLINK data to VCF Zarr is partially implemented,
and adding BGEN support is also planned. If you would like to see
support for other formats (or an interested in helping with implementing),
please open an [issue on Github](https://github.com/sgkit-dev/bio2zarr/issues)
to discuss!

The package is currently focused on command line interfaces, but a
Python API is also planned.
2 changes: 2 additions & 0 deletions docs/vcf2zarr/cli_ref.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
(sec-vcf2zarr-cli-ref)=
# CLI Reference

% A note on cross references... There's some weird long-standing problem with
Expand Down Expand Up @@ -57,6 +58,7 @@
## Encode

```{eval-rst}
.. _cmd-vcf2zarr-encode:
.. click:: bio2zarr.cli:encode
:prog: vcf2zarr encode
:nested: full
Expand Down
39 changes: 38 additions & 1 deletion docs/vcf2zarr/overview.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,44 @@
(sec-vcf2zarr)=
# vcf2zarr

Convert VCF data to the
[VCF Zarr specification](https://github.com/sgkit-dev/vcf-zarr-spec/)
reliably, in parallel or distributed over a cluster.

Convert a VCF to zarr format:
See the {ref}`sec-vcf2zarr-tutorial` for a step-by-step introduction
and the {ref}`sec-vcf2zarr-cli-ref` detailed documentation on
command line options.


## Quickstart

First {ref}`install bio2zarr<sec-installation>`.


:::{note}
FINISH ME
:::


## How does it work?
The conversion of VCF data to Zarr is a two-step process:

1. Convert ({ref}`explode<cmd-vcf2zarr-explode>`) VCF file(s) to
Intermediate Columnar Format (ICF)
2. Convert ({ref}`encode<cmd-vcf2zarr-encode>`) ICF to Zarr

This two-step process allows `vcf2zarr` to determine the correct
dimension of Zarr arrays corresponding to each VCF field, and
to keep memory usage tightly bounded while writing the arrays.

:::{important}
The intermediate columnar format is not intended for any use
other than a temporary storage while converting VCF to Zarr.
The format may change between versions of `bio2zarr`.
:::


## Common options

```
$ vcf2zarr convert <VCF1> <VCF2> <zarr>
Expand Down
1 change: 1 addition & 0 deletions docs/vcf2zarr/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ kernelspec:
language: bash
name: bash
---
(sec-vcf2zarr-tutorial)=
# Tutorial

This is a step-by-step tutorial showing you how to convert your
Expand Down
1 change: 1 addition & 0 deletions docs/vcfpartition/overview.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
(sec-vcfpartition)=
# vcfpartition

## Overview
Expand Down

0 comments on commit 6dd3c7d

Please sign in to comment.