Skip to content
This repository has been archived by the owner on Apr 8, 2024. It is now read-only.

Commit

Permalink
chore: Set up Summer 22 version
Browse files Browse the repository at this point in the history
  • Loading branch information
edmundmiller committed May 31, 2023
1 parent 5329472 commit 354d198
Show file tree
Hide file tree
Showing 31 changed files with 1,709 additions and 0 deletions.
3 changes: 3 additions & 0 deletions docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,9 @@ module.exports = {
lastVersion: 'current',
versions: {
current: {
label: 'Summer 23',
},
"22u": {
label: 'Summer 22',
},
"21u": {
Expand Down
36 changes: 36 additions & 0 deletions versioned_docs/version-22u/00-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
id: overview
title: Course Overview
description: Course Overview
sidebar_label: 'Overview'
---

# Overview

| WEEK | TUESDAY | THURSDAY |
| ---: | ---------------------------------------------------------------- | ---------------------------------------- |
| 1 | Introduction/[Setup Environment](./week_01/environment_setup.md) | [Intro to Unix](./week_01/intro_unix.md) |
| 2 | [Reproducible Computing](./week_02/intro.md) | Group Project 1 Introduction Lab |
| 3 | RNA-seq by Example | RNA-seq by Example |
| 4 | RNA-seq by Example | RNA-seq by Example |
| 5 | The Grouchy Grinch | The Grouchy Grinch |
| 6 | RNA-seq Presentations / ChIP-Seq Intro | ChIP-Seq |
| 7 | Nextflow Scripting | Nextflow Scripting |
| 8 | ChIP-seq Pipeline | ChIP-seq Pipeline |
| 8 | ChIP-seq Pipeline | ChIP-seq Pipeline |
| 9 | Project 2 Demo day / Intro to module 3 project | Variant Calling |
| 10 | Intro to Variant Calling | Variant Calling Continued/Xena Browser |
| 11 | Project Work Day | Group Demo Day/Concluding Remarks |

Issues with Biostars? [Create an issue!](https://github.com/biostars/biostar-handbook/issues/new)

# Course Alumni

| Alumni | Semester | GitHub | ag-intro Repo |
| ------------------ | -------- | ------------- | --------------------------------------------------------------------------- |
| Stephanie Yamauchi | 21U | syamauchi2000 | [syamauchi2000/ag-intro](https://github.com/syamauchi2000/ag-intro) |
| Hiba Fatima | 21U | hxf190002 | [hxf190002/ag-intro](https://github.com/hxf190002/ag-intro) |
| Mufeed Kamal | 21U | Mufeedmk4 | [Mufeedmk4/ag-intro](https://github.com/Mufeedmk4/ag-intro) |
| Saleh Karim | 21U | Salehkarim21 | [Salehkarim21/6-1-2021-Repo](https://github.com/Salehkarim21/6-1-2021-Repo) |
| Muneer Yaqub | 21U | muneeryaqub | [muneeryaqub/ag-intro](https://github.com/muneeryaqub/ag-intro) |
| Luke Ballew | 21U | lxb190012 | [lxb190013/ag-intro](https://github.com/lxb190013/ag-intro) |
4 changes: 4 additions & 0 deletions versioned_docs/version-22u/chip-seq/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"label": "ChIP-Seq",
"position": 4
}
27 changes: 27 additions & 0 deletions versioned_docs/version-22u/chip-seq/biostars.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
id: biostars
title: Biostars ChIP-Seq
description: 'Notes and issues we ran into'
sidebar_label: 'Biostars'
sidebar_position: 1
---

Replace

```sh
# Create a namespace for the tool
conda create --name macs python=2.7

# Activate the new environment.
source activate macs

# Install the tools.
conda install numpy
conda install macs2
```

with

```sh
conda create -n macs bioconda::macs2=2.2.7.1
```
111 changes: 111 additions & 0 deletions versioned_docs/version-22u/chip-seq/nextflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
id: nextflow
title: Nextflow
description: 'Data-driven computational pipelines'
sidebar_label: 'Nextflow'
sidebar_position: 2
---

## Workflow managers

The `Makefile` has been getting a little scary. It's great for one off commands
for a project, but not so much for full blown data pipelines. There are plenty
of more modern alternatives.

- [CWL](https://www.commonwl.org/user_guide/index.html)
- [WDL](https://github.com/openwdl/wdl)
- [Snakemake](https://snakemake.readthedocs.io/en/stable/)
- [Nextflow](https://www.nextflow.io/)

## What is Nextflow?

Nextflow is an incredibly powerful and flexible workflow language. It's mainly
used for bioinformatics analysis.

```groovy title="main.nf"
/*
* Default pipeline parameters. They can be overriden on the command line eg.
* given `params.foo` specify on the run command line `--foo some_value`.
*/
params.reads = "$baseDir/data/ggal/*_{1,2}.fq"
params.transcriptome = "$baseDir/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa"
params.outdir = "results"
params.multiqc = "$baseDir/multiqc"
log.info """\
R N A S E Q - N F P I P E L I N E
===================================
transcriptome: ${params.transcriptome}
reads : ${params.reads}
outdir : ${params.outdir}
"""
// import modules
include { RNASEQ } from './modules/rnaseq'
include { MULTIQC } from './modules/multiqc'
/*
* main script flow
*/
workflow {
read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true )
RNASEQ( params.transcriptome, read_pairs_ch )
MULTIQC( RNASEQ.out, params.multiqc )
}
/*
* completion handler
*/
workflow.onComplete {
log.info ( workflow.success ? "\nDone! Open the following report in your browser --> $params.outdir/multiqc_report.html\n" : "Oops .. something went wrong" )
}
```

The thing that sets Nextflow apart is that it _pushes_ the data through the
pipeline, rather than _pulling_ it through like make.

## Subworkflows

```groovy title="./modules/rnaseq.nf"
params.outdir = 'results'
include { INDEX } from './index'
include { QUANT } from './quant'
include { FASTQC } from './fastqc'
workflow RNASEQ {
take:
transcriptome
read_pairs_ch
main:
INDEX(transcriptome)
FASTQC(read_pairs_ch)
QUANT(INDEX.out, read_pairs_ch)
emit:
QUANT.out | concat(FASTQC.out) | collect
}
```

## Modules

```groovy title="./modules/index.nf"
process INDEX {
tag "$transcriptome.simpleName"
input:
path transcriptome
output:
path 'index'
script:
"""
salmon index --threads $task.cpus -t $transcriptome -i index
"""
}
```

[The full nextflow/rnaseq-nf example repo](https://github.com/nextflow-io/rnaseq-nf)
144 changes: 144 additions & 0 deletions versioned_docs/version-22u/chip-seq/nf-core.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
---
id: nf-core
title: nf-core
description: 'A community effort to collect a curated set of analysis pipelines built using Nextflow.'
sidebar_label: 'nf-core'
sidebar_position: 3
---

## nf-core Intro

<iframe width="560" height="315" src="https://www.youtube.com/embed/gUM9acK25tQ"
title="YouTube video player" frameborder="0" allow="accelerometer; autoplay;
clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen></iframe>

> A community effort to collect a curated set of analysis pipelines built using
> Nextflow.
We have the genomics core, imaging core, etc. core facilities, and nf-core!

Enough talk, let's run it!

### Testing a pipeline

[nf-core installation docs](https://nf-co.re/usage/installation)

1. Move into your chipseq repo
2. Install Nextflow

```bash
curl -fsSL get.nextflow.io | bash
mv nextflow ~/bin
```

3. Activate singularity

```bash
ml load singularity
```

4. Run

```bash
nextflow run nf-core/chipseq -profile test,utd_sysbio -r dev --outdir test-run
```

5. Update your `.gitignore`

```gitignore
.nextflow*
work/
data/
results/
```

## Running the nf-core pipeline

[Let's refer to the usage section of the pipeline's docs](https://nf-co.re/chipseq/dev/usage)

### Using the nf-core launcher

1. [Open up the nf-core launch utility](https://nf-co.re/launch?)
2. Select the `chipseq` pipeline, `dev` for the version and click Launch
3. Fill out the following command-line flags:

- profile: `utd_sysbio`
- input: `samplesheet.csv`
- email: `<netid>@utdallas.edu`
- read_length: 50
- genome: `hg19`

4. Create a file with the `nf-params.json` file it generates.

```json title="nf-params.json"
{
"input": "samplesheet.csv",
"read_length": 50,
"outdir": "ming-results",
"email": "<netid>@utdallas.edu",
"genome": "hg19"
}
```

5. We're going to need to create a samplesheet. [Please refer to the usage section of the pipeline's docs](https://nf-co.re/chipseq/dev/usage)

The data has been predownloaded for you to the class scratch directory
`/scratch/applied-genomics/` under `chipseq/ming/`.

```csv title="samplesheet.csv"
sample,fastq_1,fastq_2,antibody,control
WT_YAP1,/scratch/applied-genomics/chipseq/ming/SRR1810900.fastq.gz,,YAP1,WT_INPUT
WT_H3K27ac,/scratch/applied-genomics/chipseq/ming/SRR949140.fastq.gz,,H3K27ac,WT_INPUT
WT_INPUT,/scratch/applied-genomics/chipseq/ming/SRR949142.fastq.gz,,,
```

:::tip
If you can't get the formatting right for whatever reason there's a backup samplesheet at `/scratch/applied-genomics/chipseq/ming/samplesheet.csv` just need to update the input path
:::

6. Start `screen` which is a screen manager

```bash
login$ screen
```

:::info
Useful screen commands
:::

```bash
# Start a new screen session:
screen

# Start a new named screen session:
screen -S session_name

# Reattach to an open screen:
screen -r session_name

# Detach from inside a screen:
Ctrl + A, D

# Kill the current screen session:
Ctrl + A, K
```

7. Launch the pipeline

```bash
nextflow run nf-core/chipseq -r dev -profile utd_sysbio -params-file nf-params.json
```

The pipeline should start up, and email you when it's finished!

While we're waiting let's check out the [shell script that would've ran all of that](https://www.biostarhandbook.com/ming-tangs-guide-to-chip-seq-analysis.html#shell-script-comes-to-rescue)

## Download the Multiqc Report

1. Open up the file explorer and navigate to
`results/multiqc/multiqc_report.html` and _right-click_ the html
file and select Download.
2. Now that the multiqc report is on your local computer open it up in a web
browser. Preferably next to the [pipeline's output
docs](https://nf-co.re/chipseq/dev/output).
53 changes: 53 additions & 0 deletions versioned_docs/version-22u/misc/code_alternatives.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
id: code_alternatives
title: VS Code Alternatives
description: ''
sidebar_label: 'VS Code Alternatives'
sidebar_position: 1
---

Due to changes in some of the UT Dallas systems, we're going to cover some extra
methods to login just in case. [Refer to Environment setup](../week-1) for
alternatives.

Windows:

- [Windows Terminal](https://www.microsoft.com/en-us/p/windows-terminal/9n0dx20hk701?activetab=pivot:overviewtab#)
- [git for Windows](https://gitforwindows.org/)
- [MobaXTerm and VS Code Setup](https://www.youtube.com/watch?v=GmMsTc55gLI)

MacOS:

- [iTerm2](https://iterm2.com/)

Once installed, open up a terminal, and try logging in.

:::danger
When typing in your password, there won't be any \*'s it will just be blank. This is normal.
:::

```bash
ssh <netid>@sysbio.utdallas.edu
```

## Create SSH Keys

While we're at it let's generate ssh keys so we don't have to type in our
password everytime, and use it with our git repos as well. Public key based
authentication is most secure and has advantages over other methods as well.

[GitHub Docs for generating a new SSH key](https://docs.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent)

First, add the public key to your GitHub. Then copy it to the remote machine with

### Windows

```bash
scp C:\Users\username\.ssh\id_ed25519.pub <netid>@sysbio.utdallas.edu:~/.ssh/authorized_keys
```

### MacOS

```bash
scp ~/.ssh/id_ed25519.pub <netid>@sysbio.utdallas.edu:~/.ssh/authorized_keys
```
Loading

0 comments on commit 354d198

Please sign in to comment.