-
Notifications
You must be signed in to change notification settings - Fork 17
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
initial build workflows in snakemake (#307)
* tweaking snakemake workflow for row diff 2.0 * grouping options in workflow build command * fixing failing test * adding missing files * fixing primary graph creation workflow * tweaking how input sequence files are handled, adding --seqs-dir-path argument * fixing primary graph creation work flow * separating out workflow package * renaming command line tool to metagraph-workflows * adding --additional-snakemake-args parameter * improving default handling * moving some python code out to common.smk * updating readme, still rough draft * integrating metagraph-workflows test in CI * Actions: tweak setup of metagraph binary * moving directory with snakemake code into metagraph_workflows to simplify packaging * workflows: improve implementation of memory config management * adding --disk-swap option in more rules, some refacotring * adding build.smk which was ignored * incorporate mem config for every rule * updating workflow graph * add missing cfg_utils.py file * first iteration on supporting workflow for building graphs on a per sample basis separately * tweaking data staging mechanism and including it in the example workflow. moving example workflow related files into a seperate directory * cleaning up pypi packaging * adding some more parametrization of metagraph commands * using snakmakes log directive systematically * change lookup for rule configs, as the current one doesn't seem to work reliably (i.e. configs from the wrong rule are looked up * changing some directory names * by default, remove intermediary output files during the build phase (file can be kept using the --notemp of snakemake) * adding disk-cap, mem-cap and swap-dir to more rules. also fixed some build rules to use canonical mode * adding verbose flag to all build rules * including KMC, improving resource management * using buffer instead of cap, e.g renamed mem_cap to mem_buffer * renaming exec_cmd -> metagraph_cmd * timing commands using GNU time * when estimating memory buffer size, cap maximum at 50GB * improving logging * supporting samples consisting of several files * making a parameter out of MAX_BUFFER_SIZE * making it possible to set number of threads via config for primarize_canonical_graph_single_sample and build_canonical_graph_single_sample * tweak memory heuristics for primarize_canonical_graph_single_sample * fixing test in test_resource_management * fixing unit of disk-cap * moving all kinds of utility functions to utils.py * merging common.py and constants.py and renaming it to workflow_configs * moving 'build' subcommand related stuff to cli.py * remote rule graph related files * renaming example_workflow to test_workflow * updating setup.py * better error message in get_gnu_time_command * adding jupyter notebook with end to end example from indexing to quering using the python api * throwing exception instead of return status code in run_build_workflow * use sequences from ncbi in workflow_end_to_end_example notebook * removing template generated Makefile in workflows python package * fixing _convert_type * moving content of workflows README to sphinx documentation * some dangling changes after renaming directory
- Loading branch information
Marc Zimmermann
authored
Oct 22, 2021
1 parent
5c9d0ea
commit 1cf7a6e
Showing
37 changed files
with
4,437 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,3 +31,4 @@ Usage | |
For more examples, see `notebooks | ||
<./notebooks>`_. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
========= | ||
Workflows | ||
========= | ||
|
||
This package provides workflows for the `metagraph framework | ||
<https://metagraph.ethz.ch>`_ | ||
|
||
|
||
Workflows for Creating Graphs and Annotations | ||
--------------------------------------------- | ||
|
||
Since the creation of graph and indices comprises several steps, this package provides | ||
some support to simplify these tasks - in particular for standard cases. | ||
|
||
Given some raw sequence data and a few options like the kmer size (`k`) graphs and annotations | ||
are automatically built: | ||
|
||
.. code-block:: bash | ||
metagraph-workflows build -k 5 transcript_paths.txt /tmp/mygraph | ||
If you prefer invoking the workflow from within a python script, the following is equivalent: | ||
|
||
.. code-block:: python | ||
from metagraph_workflows import workflows | ||
workflows.run_build_workflow('/tmp/mygraph', seqs_file_list_path='transcript_paths.txt', k=5) | ||
The workflow logic itself is expressed as a `Snakemake workflow | ||
<https://snakemake.readthedocs.io/>`_ . You can also directly invoke the workflows | ||
using the `snakemake` command line tool (see below). | ||
|
||
|
||
Installation and Set up | ||
~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
|
||
Set up a conda environment and install the necessary packages using: | ||
|
||
.. code-block:: bash | ||
conda create -n metagraph-workflows python=3.8 | ||
conda activate metagraph-workflows | ||
conda install -c bioconda -c conda-forge metagraph | ||
pip install -U "git+https://github.com/ratschlab/metagraph.git#subdirectory=metagraph/workflows" | ||
Usage Example | ||
~~~~~~~~~~~~~ | ||
|
||
Typically, the following steps would be performed: | ||
|
||
1. sequence file preparation: add your sequence files of interest into a directory. | ||
2. running workflow: you can invoke the workflow using ``metagraph-workflows build``. Important parameters you may consider tuning are: | ||
|
||
* k | ||
* primary vs non primary graph creation | ||
* annotation label source: ``sequence_headers`` or ``sequence_file_names`` | ||
|
||
An example invocation: | ||
|
||
.. code-block:: bash | ||
metagraph-workflows build -k 31 \ | ||
--seqs-dir-path [PATH_TO_SEQUENCES] \ | ||
--annotation-labels-source sequence_headers \ | ||
--build-primary-graph | ||
[OUTPUT_DIR] | ||
see ``metagraph-workflows build --help`` for more help | ||
3. do queries: once you created the indices you can query either by using the command line | ||
query tool or starting the metagraph server on your laptop or another suitable machine and access | ||
do queries using e.g. the python :ref:`API` client. | ||
|
||
|
||
There is also a `jupyter notebook <https://github.com/ratschlab/metagraph/blob/master/metagraph/workflows/notebooks/workflow_end_to_end_example.ipynb>`_ walking you through an example from indexing to api querying. | ||
|
||
|
||
|
||
Workflow Management | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
The following snakemake options are exposed in the ``build`` subcommand | ||
|
||
* ``--dryrun``: see what workflow steps would be done | ||
* ``--force`` (corresponds to ``--forceall`` in snakemake): force run all steps | ||
|
||
|
||
Directly Invoking Snakemake Workflow | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
The above command is only a wrapper around a snakemake workflow. You can also | ||
directly invoke the snakemake workflow (assuming you checked out the `metagraph git repository <https://github.com/ratschlab/metagraph>`_): | ||
|
||
.. code-block:: bash | ||
cd metagraph/workflows | ||
snakemake --forceall --configfile default.yml \ | ||
--config k=5 seqs_file_list_path='transcript_paths.txt' output_directory=/tmp/mygraph \ | ||
annotation_labels_source=sequence_headers --cores 2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# http://editorconfig.org | ||
|
||
root = true | ||
|
||
[*] | ||
indent_style = space | ||
indent_size = 4 | ||
trim_trailing_whitespace = true | ||
insert_final_newline = true | ||
charset = utf-8 | ||
end_of_line = lf | ||
|
||
[*.bat] | ||
indent_style = tab | ||
end_of_line = crlf | ||
|
||
[LICENSE] | ||
insert_final_newline = false | ||
|
||
[Makefile] | ||
indent_style = tab |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
.snakemake | ||
metagraph_workflows/snakemake/output_dir_example | ||
|
||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
env/ | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# celery beat schedule file | ||
celerybeat-schedule | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# dotenv | ||
.env | ||
|
||
# virtualenv | ||
.venv | ||
venv/ | ||
ENV/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
|
||
# Pycharm | ||
.idea |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
|
||
|
||
MIT License | ||
|
||
Copyright (c) 2021, ETH Zurich, Biomedical Informatics Group; Marc Zimmermann | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
include LICENSE | ||
include requirements.txt | ||
|
||
recursive-include tests * | ||
recursive-exclude * __pycache__ | ||
recursive-exclude * *.py[co] | ||
|
||
recursive-include docs *.rst conf.py Makefile make.bat *.jpg *.png *.gif | ||
|
||
recursive-include metagraph_workflows/snakemake *.smk Snakefile default.yml | ||
recursive-include metagraph_workflows/snakemake/test_data *.fa | ||
recursive-exclude **/.snakemake * |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
=================== | ||
metagraph_workflows | ||
=================== | ||
|
||
This package provides workflows for the `metagraph framework | ||
<https://metagraph.ethz.ch>`_ | ||
|
||
See the `corresponding section <https://metagraph.ethz.ch/static/docs/workflows.html>`_ in the metagraph documentation. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
"""Top-level package for metagraph_workflows.""" | ||
|
||
__author__ = """Marc Zimmermann""" | ||
__email__ = '[email protected]' | ||
__version__ = '0.1.0' |
Oops, something went wrong.