chore: Set up Summer 22 version

Functional-Genomics-Lab · May 31, 2023 · 354d198 · 354d198
1 parent 5329472
commit 354d198
Show file tree

Hide file tree

Showing 31 changed files with 1,709 additions and 0 deletions.
diff --git a/docusaurus.config.js b/docusaurus.config.js
@@ -68,6 +68,9 @@ module.exports = {
           lastVersion: 'current',
           versions: {
             current: {
+              label: 'Summer 23',
+            },
+            "22u": {
               label: 'Summer 22',
             },
             "21u": {

diff --git a/versioned_docs/version-22u/00-overview.md b/versioned_docs/version-22u/00-overview.md
@@ -0,0 +1,36 @@
+---
+id: overview
+title: Course Overview
+description: Course Overview
+sidebar_label: 'Overview'
+---
+
+# Overview
+
+| WEEK | TUESDAY                                                          | THURSDAY                                 |
+| ---: | ---------------------------------------------------------------- | ---------------------------------------- |
+|    1 | Introduction/[Setup Environment](./week_01/environment_setup.md) | [Intro to Unix](./week_01/intro_unix.md) |
+|    2 | [Reproducible Computing](./week_02/intro.md)                     | Group Project 1 Introduction Lab         |
+|    3 | RNA-seq by Example                                               | RNA-seq by Example                       |
+|    4 | RNA-seq by Example                                               | RNA-seq by Example                       |
+|    5 | The Grouchy Grinch                                               | The Grouchy Grinch                       |
+|    6 | RNA-seq Presentations / ChIP-Seq Intro                           | ChIP-Seq                                 |
+|    7 | Nextflow Scripting                                               | Nextflow Scripting                       |
+|    8 | ChIP-seq Pipeline                                                | ChIP-seq Pipeline                        |
+|    8 | ChIP-seq Pipeline                                                | ChIP-seq Pipeline                        |
+|    9 | Project 2 Demo day / Intro to module 3 project                   | Variant Calling                          |
+|   10 | Intro to Variant Calling                                         | Variant Calling Continued/Xena Browser   |
+|   11 | Project Work Day                                                 | Group Demo Day/Concluding Remarks        |
+
+Issues with Biostars? [Create an issue!](https://github.com/biostars/biostar-handbook/issues/new)
+
+# Course Alumni
+
+| Alumni             | Semester | GitHub        | ag-intro Repo                                                               |
+| ------------------ | -------- | ------------- | --------------------------------------------------------------------------- |
+| Stephanie Yamauchi | 21U      | syamauchi2000 | [syamauchi2000/ag-intro](https://github.com/syamauchi2000/ag-intro)         |
+| Hiba Fatima        | 21U      | hxf190002     | [hxf190002/ag-intro](https://github.com/hxf190002/ag-intro)                 |
+| Mufeed Kamal       | 21U      | Mufeedmk4     | [Mufeedmk4/ag-intro](https://github.com/Mufeedmk4/ag-intro)                 |
+| Saleh Karim        | 21U      | Salehkarim21  | [Salehkarim21/6-1-2021-Repo](https://github.com/Salehkarim21/6-1-2021-Repo) |
+| Muneer Yaqub       | 21U      | muneeryaqub   | [muneeryaqub/ag-intro](https://github.com/muneeryaqub/ag-intro)             |
+| Luke Ballew        | 21U      | lxb190012     | [lxb190013/ag-intro](https://github.com/lxb190013/ag-intro)                 |
diff --git a/versioned_docs/version-22u/chip-seq/_category_.json b/versioned_docs/version-22u/chip-seq/_category_.json
@@ -0,0 +1,4 @@
+{
+  "label": "ChIP-Seq",
+  "position": 4
+}
diff --git a/versioned_docs/version-22u/chip-seq/biostars.md b/versioned_docs/version-22u/chip-seq/biostars.md
@@ -0,0 +1,27 @@
+---
+id: biostars
+title: Biostars ChIP-Seq
+description: 'Notes and issues we ran into'
+sidebar_label: 'Biostars'
+sidebar_position: 1
+---
+
+Replace
+
+```sh
+# Create a namespace for the tool
+conda create --name macs python=2.7
+
+# Activate the new environment.
+source activate macs
+
+# Install the tools.
+conda install numpy
+conda install macs2
+```
+
+with
+
+```sh
+conda create -n macs bioconda::macs2=2.2.7.1
+```
diff --git a/versioned_docs/version-22u/chip-seq/nextflow.md b/versioned_docs/version-22u/chip-seq/nextflow.md
@@ -0,0 +1,111 @@
+---
+id: nextflow
+title: Nextflow
+description: 'Data-driven computational pipelines'
+sidebar_label: 'Nextflow'
+sidebar_position: 2
+---
+
+## Workflow managers
+
+The `Makefile` has been getting a little scary. It's great for one off commands
+for a project, but not so much for full blown data pipelines. There are plenty
+of more modern alternatives.
+
+- [CWL](https://www.commonwl.org/user_guide/index.html)
+- [WDL](https://github.com/openwdl/wdl)
+- [Snakemake](https://snakemake.readthedocs.io/en/stable/)
+- [Nextflow](https://www.nextflow.io/)
+
+## What is Nextflow?
+
+Nextflow is an incredibly powerful and flexible workflow language. It's mainly
+used for bioinformatics analysis.
+
+```groovy title="main.nf"
+/*
+ * Default pipeline parameters. They can be overriden on the command line eg.
+ * given `params.foo` specify on the run command line `--foo some_value`.
+ */
+
+params.reads = "$baseDir/data/ggal/*_{1,2}.fq"
+params.transcriptome = "$baseDir/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa"
+params.outdir = "results"
+params.multiqc = "$baseDir/multiqc"
+
+log.info """\
+ R N A S E Q - N F   P I P E L I N E
+ ===================================
+ transcriptome: ${params.transcriptome}
+ reads        : ${params.reads}
+ outdir       : ${params.outdir}
+ """
+
+// import modules
+include { RNASEQ } from './modules/rnaseq'
+include { MULTIQC } from './modules/multiqc'
+
+/*
+ * main script flow
+ */
+workflow {
+  read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true )
+  RNASEQ( params.transcriptome, read_pairs_ch )
+  MULTIQC( RNASEQ.out, params.multiqc )
+}
+
+/*
+ * completion handler
+ */
+workflow.onComplete {
+    log.info ( workflow.success ? "\nDone! Open the following report in your browser --> $params.outdir/multiqc_report.html\n" : "Oops .. something went wrong" )
+}
+```
+
+The thing that sets Nextflow apart is that it _pushes_ the data through the
+pipeline, rather than _pulling_ it through like make.
+
+## Subworkflows
+
+```groovy title="./modules/rnaseq.nf"
+params.outdir = 'results'
+
+include { INDEX } from './index'
+include { QUANT } from './quant'
+include { FASTQC } from './fastqc'
+
+workflow RNASEQ {
+  take:
+    transcriptome
+    read_pairs_ch
+
+  main:
+    INDEX(transcriptome)
+    FASTQC(read_pairs_ch)
+    QUANT(INDEX.out, read_pairs_ch)
+
+  emit:
+     QUANT.out | concat(FASTQC.out) | collect
+}
+```
+
+## Modules
+
+```groovy title="./modules/index.nf"
+process INDEX {
+    tag "$transcriptome.simpleName"
+
+    input:
+    path transcriptome
+
+    output:
+    path 'index'
+
+    script:
+    """
+    salmon index --threads $task.cpus -t $transcriptome -i index
+    """
+}
+```
+
+[The full nextflow/rnaseq-nf example repo](https://github.com/nextflow-io/rnaseq-nf)
diff --git a/versioned_docs/version-22u/chip-seq/nf-core.md b/versioned_docs/version-22u/chip-seq/nf-core.md
@@ -0,0 +1,144 @@
+---
+id: nf-core
+title: nf-core
+description: 'A community effort to collect a curated set of analysis pipelines built using Nextflow.'
+sidebar_label: 'nf-core'
+sidebar_position: 3
+---
+
+## nf-core Intro
+
+<iframe width="560" height="315" src="https://www.youtube.com/embed/gUM9acK25tQ"
+title="YouTube video player" frameborder="0" allow="accelerometer; autoplay;
+clipboard-write; encrypted-media; gyroscope; picture-in-picture"
+allowfullscreen></iframe>
+
+> A community effort to collect a curated set of analysis pipelines built using
+> Nextflow.
+
+We have the genomics core, imaging core, etc. core facilities, and nf-core!
+
+Enough talk, let's run it!
+
+### Testing a pipeline
+
+[nf-core installation docs](https://nf-co.re/usage/installation)
+
+1. Move into your chipseq repo
+2. Install Nextflow
+
+```bash
+curl -fsSL get.nextflow.io | bash
+mv nextflow ~/bin
+```
+
+3. Activate singularity
+
+```bash
+ml load singularity
+```
+
+4. Run
+
+```bash
+nextflow run nf-core/chipseq -profile test,utd_sysbio -r dev --outdir test-run
+```
+
+5. Update your `.gitignore`
+
+```gitignore
+.nextflow*
+work/
+data/
+results/
+```
+
+## Running the nf-core pipeline
+
+[Let's refer to the usage section of the pipeline's docs](https://nf-co.re/chipseq/dev/usage)
+
+### Using the nf-core launcher
+
+1. [Open up the nf-core launch utility](https://nf-co.re/launch?)
+2. Select the `chipseq` pipeline, `dev` for the version and click Launch
+3. Fill out the following command-line flags:
+
+   - profile: `utd_sysbio`
+   - input: `samplesheet.csv`
+   - email: `<netid>@utdallas.edu`
+   - read_length: 50
+   - genome: `hg19`
+
+4. Create a file with the `nf-params.json` file it generates.
+
+```json title="nf-params.json"
+{
+  "input": "samplesheet.csv",
+  "read_length": 50,
+  "outdir": "ming-results",
+  "email": "<netid>@utdallas.edu",
+  "genome": "hg19"
+}
+```
+
+5. We're going to need to create a samplesheet. [Please refer to the usage section of the pipeline's docs](https://nf-co.re/chipseq/dev/usage)
+
+The data has been predownloaded for you to the class scratch directory
+`/scratch/applied-genomics/` under `chipseq/ming/`.
+
+```csv title="samplesheet.csv"
+sample,fastq_1,fastq_2,antibody,control
+WT_YAP1,/scratch/applied-genomics/chipseq/ming/SRR1810900.fastq.gz,,YAP1,WT_INPUT
+WT_H3K27ac,/scratch/applied-genomics/chipseq/ming/SRR949140.fastq.gz,,H3K27ac,WT_INPUT
+WT_INPUT,/scratch/applied-genomics/chipseq/ming/SRR949142.fastq.gz,,,
+```
+
+:::tip
+If you can't get the formatting right for whatever reason there's a backup samplesheet at `/scratch/applied-genomics/chipseq/ming/samplesheet.csv` just need to update the input path
+:::
+
+6. Start `screen` which is a screen manager
+
+```bash
+login$ screen
+```
+
+:::info
+Useful screen commands
+:::
+
+```bash
+# Start a new screen session:
+screen
+
+# Start a new named screen session:
+screen -S session_name
+
+# Reattach to an open screen:
+screen -r session_name
+
+# Detach from inside a screen:
+    Ctrl + A, D
+
+# Kill the current screen session:
+    Ctrl + A, K
+```
+
+7. Launch the pipeline
+
+```bash
+nextflow run nf-core/chipseq -r dev -profile utd_sysbio -params-file nf-params.json
+```
+
+The pipeline should start up, and email you when it's finished!
+
+While we're waiting let's check out the [shell script that would've ran all of that](https://www.biostarhandbook.com/ming-tangs-guide-to-chip-seq-analysis.html#shell-script-comes-to-rescue)
+
+## Download the Multiqc Report
+
+1. Open up the file explorer and navigate to
+   `results/multiqc/multiqc_report.html` and _right-click_ the html
+   file and select Download.
+2. Now that the multiqc report is on your local computer open it up in a web
+   browser. Preferably next to the [pipeline's output
+   docs](https://nf-co.re/chipseq/dev/output).
diff --git a/versioned_docs/version-22u/misc/code_alternatives.md b/versioned_docs/version-22u/misc/code_alternatives.md
@@ -0,0 +1,53 @@
+---
+id: code_alternatives
+title: VS Code Alternatives
+description: ''
+sidebar_label: 'VS Code Alternatives'
+sidebar_position: 1
+---
+
+Due to changes in some of the UT Dallas systems, we're going to cover some extra
+methods to login just in case. [Refer to Environment setup](../week-1) for
+alternatives.
+
+Windows:
+
+- [Windows Terminal](https://www.microsoft.com/en-us/p/windows-terminal/9n0dx20hk701?activetab=pivot:overviewtab#)
+- [git for Windows](https://gitforwindows.org/)
+- [MobaXTerm and VS Code Setup](https://www.youtube.com/watch?v=GmMsTc55gLI)
+
+MacOS:
+
+- [iTerm2](https://iterm2.com/)
+
+Once installed, open up a terminal, and try logging in.
+
+:::danger
+When typing in your password, there won't be any \*'s it will just be blank. This is normal.
+:::
+
+```bash
+ssh <netid>@sysbio.utdallas.edu
+```
+
+## Create SSH Keys
+
+While we're at it let's generate ssh keys so we don't have to type in our
+password everytime, and use it with our git repos as well. Public key based
+authentication is most secure and has advantages over other methods as well.
+
+[GitHub Docs for generating a new SSH key](https://docs.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent)
+
+First, add the public key to your GitHub. Then copy it to the remote machine with
+
+### Windows
+
+```bash
+scp C:\Users\username\.ssh\id_ed25519.pub <netid>@sysbio.utdallas.edu:~/.ssh/authorized_keys
+```
+
+### MacOS
+
+```bash
+scp ~/.ssh/id_ed25519.pub <netid>@sysbio.utdallas.edu:~/.ssh/authorized_keys
+```