Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CUT&RUN tutorial #5104

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
69 changes: 25 additions & 44 deletions topics/epigenetics/tutorials/cut_and_run/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ We first have to check if our data contains adapter sequences that we have to re
> - *"Input Collection"*: `2 PE fastqs`
>
> 2. {% tool [FastQC](toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.73+galaxy0) %} with the following parameters:
> - *"Short read data from your current history"*: Choose the output of **Flatten collection** {% icon tool %}selected as **Dataset collection**.
> - {% icon param-collection %} *"Raw read data from your current history"*: Choose the output of **Flatten collection** {% icon tool %} selected as **Dataset collection**.
> 3. Inspect the web page output of **FastQC** {% icon tool %} for the `Rep1_forward` sample. Check what adapters are found at the end of the reads.
>
> > <question-title></question-title>
Expand Down Expand Up @@ -159,7 +159,7 @@ We first have to check if our data contains adapter sequences that we have to re
> > >
> > > 4. **Adapter Content**
> > >
> > > Our data contains an adapter that we still have to remove.
> > > Our data contains an adapter (Illumina Universal Adapter) that we still have to remove.
> > >
> > {: .solution}
> >
Expand All @@ -179,7 +179,7 @@ The FastQC report pointed out that we have in our data some standard Illumina ad

> <hands-on-title>Task description</hands-on-title>
>
> 1. {% tool [Trim Galore!]( https://toolshed.g2.bx.psu.edu/view/bgruening/trim_galore/cd7e644cae1d) %} with the following parameters:
> 1. {% tool [Trim Galore!](toolshed.g2.bx.psu.edu/repos/bgruening/trim_galore/trim_galore/0.6.7+galaxy0) %} with the following parameters:
> - *"Is this library paired- or single-end?"*: `Paired Collection`
> - *"Select a paired collection"*: select `2 PE fastqs`
> - In *"Adapter sequence to be trimmed"*: `Illumina universal`
Expand All @@ -188,7 +188,7 @@ The FastQC report pointed out that we have in our data some standard Illumina ad
> - In *"Discard reads that became shorter than length N"*: `15`
> - In *"Generate a report file"*: `Yes`
>
> 2. Click on the {% icon galaxy-eye %} (eye) icon of the report for `Rep1` and read the first lines.
> 2. Click on the {% icon galaxy-eye %} (eye) icon of the report for `Rep1` and look for the info.
{: .hands_on}

> <question-title></question-title>
Expand All @@ -198,8 +198,8 @@ The FastQC report pointed out that we have in our data some standard Illumina ad
>
> > <solution-title></solution-title>
> >
> > 1. ~55% and ~57%
> > 2. 100%
> > 1. ~55% for Read 1 and ~57% for Read 2
> > 2. The last line indicates that 3.5% of pairs have been removed.
lldelisle marked this conversation as resolved.
Show resolved Hide resolved
> >
> {: .solution}
>
Expand All @@ -216,32 +216,13 @@ The hg38 version of the human genome contains [alternate loci](https://www.ncbi.
are present both in the canonical chromosome and on its alternate loci. The reads that map to these regions would map twice. To be able to filter reads falling into
repetitive regions but keep reads falling into regions present in alternate loci, we will map on the Canonical version of hg38 (only the chromosome with numbers, chrX, chrY, and chrM).

> <comment-title>Dovetailing</comment-title>
> We will allow dovetailing of read pairs with Bowtie2. This is because adapters are removed by Cutadapt only when at least 3 bases match the adapter sequence, so it is possible that after trimming a read can contain 1-2 bases of adapter and go beyond it's mate start site. This occurs especially for CUT&RUN because the read length is quite short. Bowtie thus discards reads such as:
> ```
> ---------------------Mate 1--------------------------------->
> <---------------------Mate 2----------------------
> ```
>
> or
>
> ```
> ---------------------Mate 1--------------------------------->
> <---------------------Mate 2---------------------------------
> ```
>
> This is what we call dovetailing and we want to consider this pair as a valid concordant alignment.
{: .comment}


lldelisle marked this conversation as resolved.
Show resolved Hide resolved
> <hands-on-title>Mapping reads to reference genome</hands-on-title>
>
> 1. {% tool [Bowtie2](toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.4.5+galaxy1) %} with the following parameters:
> 1. {% tool [Bowtie2](toolshed.g2.bx.psu.edu/repos/devteam/bowtie2/bowtie2/2.5.3+galaxy1) %} with the following parameters:
> - *"Is this single or paired library"*: `Paired-end Dataset Collection`
> - *"FASTQ Paired Dataset*: select the output of **Trim Galore!** {% icon tool %} *"paired reads"*
> - *"Do you want to set paired-end options?"*: `Yes`
> - *"Set the maximum fragment length for valid paired-end alignments"*: `1000`
> - *"Allow mate dovetailing"*: `Yes`
> - *"Will you select a reference genome from your history or use a built-in index?"*: `Use a built-in genome index`
> - *"Select reference genome"*: `Human (Homo sapiens): hg38 Canonical`
> - *"Set read groups information?"*: `Do not set`
Expand All @@ -264,7 +245,7 @@ repetitive regions but keep reads falling into regions present in alternate loci
>
> > <solution-title></solution-title>
> >
> > 41.46+57.51=98.97%
> > 36.47+62.39=98.86%
lldelisle marked this conversation as resolved.
Show resolved Hide resolved
> >
> {: .solution}
>
Expand All @@ -280,21 +261,21 @@ so we remove these reads. We also remove reads with low mapping quality and read

> <hands-on-title>Filtering of uninformative reads</hands-on-title>
>
> 1. {% tool [Filter BAM datasets on a variety of attributes](toolshed.g2.bx.psu.edu/repos/devteam/bamtools_filter/bamFilter/2.5.1+galaxy0) %} with the following parameters:
> 1. {% tool [Filter BAM datasets on a variety of attributes](toolshed.g2.bx.psu.edu/repos/devteam/bamtools_filter/bamFilter/2.5.2+galaxy2) %} with the following parameters:
> - {% icon param-collection %} *"BAM dataset(s) to filter"*: Select the output of **Bowtie2** {% icon tool %} *"alignments"*
> - In *"Condition"*:
> - {% icon param-repeat %} *"Insert Condition"*
> - {% icon param-repeat %} *"Condition"*
> - In *"Filter"*:
> - {% icon param-repeat %} *"Insert Filter"*
> - *"Select BAM property to filter on"*: `mapQuality`
> - *"Filter"*
> - *"Select BAM property to filter on"*: `Mapping quality`
> - *"Filter on read mapping quality (phred scale)"*: `>=30`
> - {% icon param-repeat %} *"Insert Filter"*
> - *"Select BAM property to filter on"*: `Proper Pair`
> - *"Select properly paired reads"*: `Yes`
> - {% icon param-repeat %} *"Insert Filter"*
> - *"Select BAM property to filter on"*: `Reference name of the read`
> - *"Filter on the reference name for the read"*: `!chrM`
> - *"Would you like to set rules?"*: `No`
> - *"Would you like to set rules?"*: `False`
>
>
> 2. Click on the input and the output BAM files of the filtering step. Check the size of the files.
Expand All @@ -308,7 +289,7 @@ so we remove these reads. We also remove reads with low mapping quality and read
>
> > <solution-title></solution-title>
> >
> > 1. The original BAM files are 14-15 MB, the filtered ones are 5.6 and 6.8 MB. Approximately half of the alignments were removed.
> > 1. The original BAM files are 14-15 MB, the filtered ones are 5.8 and 7.1 MB. Approximately half of the alignments were removed.
> >
> > 2. You should modify the mapQuality criteria and decrease the threshold.
> >
Expand All @@ -322,7 +303,7 @@ Because of the PCR amplification, there might be read duplicates (different read

> <hands-on-title>Remove duplicates</hands-on-title>
>
> 1. {% tool [MarkDuplicates](toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_MarkDuplicates/2.18.2.3) %} with the following parameters:
> 1. {% tool [MarkDuplicates](toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_MarkDuplicates/3.1.1.0) %} with the following parameters:
> - {% icon param-collection %} *"Select SAM/BAM dataset or dataset collection"*: Select the output of **Filter** {% icon tool %} *"BAM"*
> - *"If true do not write duplicates to the output file instead of writing them with appropriate flags set"*: `Yes`
>
Expand Down Expand Up @@ -363,8 +344,8 @@ Because of the PCR amplification, there might be read duplicates (different read
>
> > <solution-title></solution-title>
> >
> > 1. 81466
> > 2. 983
> > 1. 81460
> > 2. 982
> >
> {: .solution}
>
Expand All @@ -378,7 +359,7 @@ too much compared to the diversity of the library you generated. Consequently, l
> <hands-on-title>Check Adapter Removal with FastQC</hands-on-title>
>
> 1. {% tool [FastQC](toolshed.g2.bx.psu.edu/repos/devteam/fastqc/fastqc/0.73+galaxy0) %} with the following parameters:
> - {% icon param-collection %} *"Short read data from your current history"*: select the output of **MarkDuplicates** .
> - {% icon param-collection %} *"Raw read data from your current history"*: select the output of **MarkDuplicates** BAM.
>
> 2. Click on the {% icon galaxy-eye %} (eye) icon of the report and read the first lines.
{: .hands_on}
Expand Down Expand Up @@ -445,7 +426,7 @@ We convert the BAM file to BED format because when we set the extension size in

> <hands-on-title>Convert the BAM to BED</hands-on-title>
>
> 1. {% tool [bedtools BAM to BED converter](toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_bamtobed/2.30.0+galaxy2) %} with the following parameters:
> 1. {% tool [bedtools BAM to BED converter](toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_bamtobed/2.31.1+galaxy0) %} with the following parameters:
> - {% icon param-collection %} *"Convert the following BAM file to BED"*: Select the output of **MarkDuplicates** {% icon tool %} BAM output
>
{: .hands_on}
Expand All @@ -455,7 +436,7 @@ We call peaks with MACS2. To get the coverage centered on the 5' extended 100bp

> <hands-on-title>Call peaks with MACS2</hands-on-title>
>
> 1. {% tool [MACS2 callpeak](toolshed.g2.bx.psu.edu/repos/iuc/macs2/macs2_callpeak/2.2.7.1+galaxy0) %} with the following parameters:
> 1. {% tool [MACS2 callpeak](toolshed.g2.bx.psu.edu/repos/iuc/macs2/macs2_callpeak/2.2.9.1+galaxy0) %} with the following parameters:
> - *"Are you pooling Treatment Files?"*: `No`
> - {% icon param-collection %} Select the output of **bedtools BAM to BED** converter {% icon tool %}
> - *"Do you have a Control File?"*: `No`
Expand Down Expand Up @@ -512,7 +493,7 @@ We can remove such peaks if we simply overlap the two peak files and consequentl

> <hands-on-title>Select robust GATA1 peaks:</hands-on-title>
>
> 1. {% tool [bedtools Intersect intervals find overlapping intervals in various ways](toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_intersectbed/2.30.0+galaxy1) %} with the following parameters:
> 1. {% tool [bedtools Intersect intervals find overlapping intervals in various ways](toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_intersectbed/2.31.1+galaxy0) %} with the following parameters:
> - {% icon param-file %} *"File A to intersect with B"*: Select `Rep1`
> - *"Combined or separate output files"*: `One output file per 'input B' file`
> - {% icon param-file %} *"File B to intersect with A"*: Select `Rep2`
Expand Down Expand Up @@ -571,9 +552,9 @@ We can further remove some noise with a positive control, that is why we have do
> >
> > > <solution-title></solution-title>
> > >
> > > 1. 2,409 peaks
> > > 2. 454 peaks
> > > 3. Our precision is ~ 84%. A high precision is an indicator that we can predict true binding regions with high confidence.
> > > 1. 2,865 peaks (this is the number of lines of our last output)
> > > 2. About 4000 peaks (Rep1 and Rep2 has about 6-7 thousand peaks each)
lldelisle marked this conversation as resolved.
Show resolved Hide resolved
> > > 3. Our precision is ~ 33%. A high precision is an indicator that we can predict true binding regions with high confidence.
> > {: .solution}
> >
> {: .question}
Expand Down Expand Up @@ -622,7 +603,7 @@ Let's find out the sequence motifs of the TF GATA1. Studies have revealed that G
> >
> > > <solution-title></solution-title>
> > >
> > > 1. !!!! Where is this info? !!!
> > > 1. Where is this info?
lldelisle marked this conversation as resolved.
Show resolved Hide resolved
> > > 2. E-value = 5.0e-383
> > {: .solution}
> >
Expand Down
Loading