Example Pipeline


To run RapClust pipeline we need to have the following information beforehand:

1. RNA-seq reads of the experiment in two different conditions and possibly multiple replicates.

conditions are used only to attempt to disambiguate between transcript isoforms and transcripts of orthologous genes.  Thus, even if you have more than two conditions, you can group your samples into *pseudo-conditions* (i.e. choosing a single major factor to differentiate between the groups), and RapClust will still produce a useful set of clusters.

**2.** *de novo* assembly (set of contigs) of the RNA-seq reads. Assembly can be performed using trinity which can be found [here](  
~~~**Note**: Input assembly can be from any standard assembler, trinity is used just as an example here.

**3.** Quantification of the RNA-seq reads separately in two different conditions using the above set of contigs as the reference.  
~~~**Note**: Currently we only support [Sailfish]([Salmon](

**4.** RapClust source code/binary can be found [here](

## Pipeline:

### 1. *de novo* assembly:
`Trinity --seqType fq --left reads_1.fq --right reads_2.fq --CPU 6 --max_memory 20G`
* output would be available as Trinity.Fasta (i.e. the set of contigs).
* If you face problem in this step, some tips are available [here]( or raise issue [here](

### 2. Quantification:
Here we can use either Sailfish/Salmon, example below is for Sailfish.

* Clone and build Sailfish:  
`git clone`  
`cd sailfish && mkdir build && cd build`  
`cmake .. && make`

* Make index for the reference (i.e. the set of contigs in our case):  
`sailfish index -t <ref_transcripts>/Trinity.fa -o <out_dir>/index -k <kmer_len>/31`

* Quantify reads:  
Based on the number of replicates in each condition we have to run sailfish multiple times, our example assumes two conditions(**A** and **B**) with three replicates(**1**, **2**, **3**) in each:  
`parallel -j 6 "samp={}; sailfish quant -i index -l IU -1 <(gunzip -c reads/{$samp}_1.fq.gz) -2 <(gunzip -c reads/{$samp}_2.fq.gz) -o {$samp}_quant --dumpEq -p 4" ::: A1 A2 B1 B2 C1 C2`

### 3. Clustering:
~~~Note: A detailed explanation of this step can also be found [here](  
* If you have conda than RapClust can be installed directly from the cloud without any concern for the dependencies.  
`conda create --name rapclust_env python=3`  
`source activate rapclust_env `  
`conda install -c k3yavi rapclust` or `conda install -c biopython rapclust`  
* [optional] Below command can be used to install miniconda if conda was not available.  

* Make configuration file:  
Make a file with extension **.yaml** with following mandatory fields:  
    - A
    - B
        - A1_quant
        - A2_quant
        - A3_quant
        - B1_quant
        - B2_quant
        - B3_quant
outdir: <output_dir>/human_rapclust
* Run RapClust  
`RapClust --config <Name_of_file>.yaml`
