Quaich_aging
is an extension of Quaich
specialized for the analysis of aging-related data. The extension involves additional modules for pairwise comparisons of features obtained from Hi-C maps, as well as modules for plotting graphs that are convenient for the further analysis.
Quaich
is a snakemake
based workflow for reproducible and flexible analysis of Hi-C data. Quaich uses multi-resolution cooler
(.mcool) files as its input. These files can be generated efficiently by the distiller
data processing pipeline. Quaich
takes advantage of the open2c
ecosystem for analysis of C data, primarily making use of command line tools from cooltools
. Quaich
also makes use of chromosight and mustache to call Hi-C peaks (peaks, dots) as well as coolpuppy to generate lots of pileups.
Snakemake is a workflow manager for reproducible and scalable data analyses, based around the concept of rules. Rules used in Quaich
are defined in the Snakefile. Quaich
then uses a yaml config file to specify which rules to run, and which parameters to use for those rules.
Clone the repository to your local system, into the place where you want to perform the data analysis. For example, use the following command to clone the repository:
git clone [email protected]:ComputationalAgingLab/quaich_aging.git
Move to your working directory:
cd quaich_aging
Configure conda channel priority:
conda config --set channel_priority flexible
Install requirements using conda (it may require some time):
conda env create -f workflow/envs/environment.yml
This will create an environment quaich_aging
where you can launch the pipeline.
For Snakemake installation details, see the instructions in the Snakemake documentation.
Activate the conda environment:
conda activate quaich_aging
Configure the conda environment channel priority with the following small (but critical) line:
conda config --set channel_priority strict
Download genome fasta file necessary for the test (don't forget to permit the file execution if needed by the command chmod +x prepare_test.sh
):
bash prepare_test.sh
Execute the test workflow locally via
snakemake --use-conda --configfile config/config.yml --cores 10
Configure the workflow according to your needs via editing the files in the config/
folder. Adjust config.yaml
to configure the workflow execution, and samples.tsv
to specify your sample setup. If you want to use any external bed or bedpe files for pileups, describe them in the annotations.tsv
file, and pairings of samples with annotations in samples_annotations.tsv
.
Test your configuration by performing a dry-run via
snakemake --use-conda --configfile config/config.yml -n
As before, execute the workflow locally via
snakemake --use-conda --configfile config/config.yml --cores $N
using $N
cores or run it in a cluster environment via
snakemake --use-conda --configfile config/config.yml --cluster qsub --jobs 100
Try mamba
distributive instead of conda
but having all its functional:
conda install -n base -c conda-forge mamba
Reset your current base
environment:
conda activate base
Then install the environment using mamba
mamba env create -f workflow/envs/environment.yml
The following analyses can be configured in the original pipeline:
- eigenvector: calculates cis eigenvectors using cooltools for all resolutions within specified resolution_limits.
- saddle: calculates saddles, reflecting average interaction preferences, from cis eigenvectors for each sample using cooltools.
- pileups: extract regions of interest (e.g. according to some bed file) from Hi-C maps and build aggregated data frames containing averages of these regions.
- insulation: calculates diamond insulation score for specified resolutions and window sizes, using cooltools. Currently runs separately for different window sizes.
- call_dots: three methods of calling dots, at specified resolutions, and postprocess output to bedpe. Implemented callers are cooltools, mustache and chromosight. Only runs on specified samples.
- compare_boundaries: generates differential boundaries between specified samples, used as input for pileups.
- call_TADs: combines lists of strong boundaries for specified samples into a list across window sizes for each resolution, filtered by length, used as input for pileups.
The following analyses added in the quaich_aging
:
- interchroms: computes a matrix of contacts sums for all possible pairs of chromosomes.
- compare_interchroms: plots the normalized ratio of selected pairs of contact sums matrices in a form of heatmap.
- scaling_ratio: plots the ratio of selected pairs of scaling profiles.
- eigenvectors_correlation: plots eigenvectors correlation clustermap for each particular chromosome and for the full genome.
- tad_ratio: plots the ratio of selected pairs of averaged and normalized TADs.
- loop_ratio: plots the ratio of selected pairs of averaged and normalized loops.
- This is the fork of Ilya Flyamer's (@phlya) original project modified by Dmitrii Kriukov (@shappiron) for aging-related data analysis.