Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update compute instructions and update links #68

Merged
merged 3 commits into from
Oct 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ repos:
always_run: true
fail_fast: true
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v4.5.0
hooks:
- id: check-merge-conflict
- id: debug-statements
Expand All @@ -19,7 +19,7 @@ repos:
- id: check-yaml
- id: trailing-whitespace
- repo: https://github.com/asottile/reorder_python_imports
rev: v3.10.0
rev: v3.12.0
hooks:
- id: reorder-python-imports
args:
Expand All @@ -28,27 +28,27 @@ repos:
--unclassifiable-application-module=_tskit,
]
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: "v0.0.282"
rev: "v0.1.2"
hooks:
- id: ruff
args:
[
"--per-file-ignores=tests/test_utils.py:E501,manticore/tests/wm/snakemake.py:E501",
]
- repo: https://github.com/psf/black
rev: 23.7.0
rev: 23.10.1
hooks:
- id: black
language_version: python3
- repo: https://github.com/asottile/blacken-docs
rev: 1.15.0
rev: 1.16.0
hooks:
- id: blacken-docs
args: [--skip-errors]
additional_dependencies: [black==22.3.0]
language_version: python3
- repo: https://github.com/DavidAnson/markdownlint-cli2
rev: v0.8.1
rev: v0.10.0
hooks:
- id: markdownlint-cli2
files: \.(md|qmd)$
Expand All @@ -59,7 +59,7 @@ repos:
types: [file]
exclude: LICENSE.md
- repo: https://github.com/editorconfig-checker/editorconfig-checker.python
rev: '2.7.2'
rev: '2.7.3'
hooks:
- id: editorconfig-checker
alias: ec
Expand All @@ -71,7 +71,7 @@ repos:
docs/slides/population_structure/data/DavidReich/AADR\.geno
)
- repo: https://github.com/lorenzwalthert/precommit
rev: v0.3.2.9019
rev: v0.3.2.9023
hooks:
- id: style-files
name: style-files
Expand Down
6 changes: 3 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -40,16 +40,16 @@ install-pgip: ## Create pgip environment and install packages to render site
if conda env list | cut -f1 -d ' ' | grep -q "${PGIP}$$"; then \
echo "Environment exists; skipping install"; \
else \
mamba env create -n ${PGIP} --file environment.yml; \
mamba env create -n ${PGIP} conda-linux-64.lock; \
fi;

install-dev: install-pgip ## Install additional development tools
mamba env update -n ${PGIP} --file environment-dev.yml

install-R: install-pgip ## Install additional R packages that require manual installation
R -e "install.packages('dotenv', repos=c(CRAN = 'https://cran.rstudio.com/'))"
R -e "tinytex::tlmgr_update()"
R -e "tinytex::reinstall_tinytex(force=TRUE)"
#R -e "tinytex::tlmgr_update()"
#R -e "tinytex::reinstall_tinytex(force=TRUE)"
R -e "library(devtools); devtools::install_local('src/R/pgip')"

install-kernels: install-pgip ## Install python and bash kernel
Expand Down
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,6 @@ environment variable:
conda activate pgip
make install-R
make install-kernels
make install-bcftools
make install-pixy
make install-dev

Expand All @@ -73,9 +72,9 @@ issues](#installation-issues).

### Create pgip conda environment

Create a conda environment called `pgip` using the environment file
Create a conda environment called `pgip` using the conda lock file

mamba env create --file environment.yml
mamba env create -n pgip conda-linux-64.lock

and activate the environment

Expand Down Expand Up @@ -123,7 +122,10 @@ information.

[Install Quarto](https://quarto.org/docs/get-started) version [Quarto\>=1.2.475](https://quarto.org/docs/download/).

### bcftools manual install
### bcftools manual install (OBSOLETE)

**OBSOLETE**: installing pgip using `conda-linux-64.lock` obviates the
need to install bcftools manually.

Due to dependency issues, bcftools has to be manually installed:

Expand Down
66 changes: 55 additions & 11 deletions conda-linux-64.lock

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions docs/_data.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,5 +115,13 @@ exercises/variant_filtering:
- __variants__
- __redyellow_variants__

exercises/genetic_diversity:
__data__:
- __sampleinfo__
allsites.vcf.gz: monkeyflower/tiny/allsites.vcf.gz
allsites.vcf.gz.tbi: monkeyflower/tiny/allsites.vcf.gz.tbi
redyellow.vcf.gz: monkeyflower/tiny/redyellow.allsites.vcf.gz
redyellow.vcf.gz.tbi: monkeyflower/tiny/redyellow.allsites.vcf.gz.tbi

slides/foundations:
data/Homo_sapiens: Homo_sapiens
6 changes: 5 additions & 1 deletion docs/_variables.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,12 @@ lab:
coalescent: exercises/simulation/index.html#sec-exercise-simulation-coalescent
msprime: exercises/simulation/index.html#sec-exercise-simulation-msprime

webexport: https://export.uppmax.uu.se/uppstore2017171/workshops/pgip/monkeyflower
uppmaxproject: naiss2023-22-1084

webexport:
baseurl: https://export.uppmax.uu.se
user: pgip
url: https://export.uppmax.uu.se/naiss2023-22-1084

demes:
ooa: pgip_data/data/ooa/ooa.demes.yaml
Expand Down
99 changes: 97 additions & 2 deletions docs/assets/bibliography.bib

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/assets/css/custom.scss
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ $presentation-slide-text-align: left !default;

$primary: $lime;
$secondary: $lime75;
$link-color: $lime75;
$link-color: $teal;
$body-color: $black;
$code-block-bg: WhiteSmoke;

Expand Down
155 changes: 106 additions & 49 deletions docs/exercises/compute_environment/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,67 +13,79 @@ format: html

<!-- markdownlint-enable MD041 -->

Computer exercise requirements are listed in `Tools` callout blocks in
each exercise. We will primarily use gitpod and JupyterLite for
exercises, which ideally means you do not need to install any software
dependencies yourself. The `Tools` callout block contains listings of
programs, along with specifications for two fallback solutions,
whenever relevant:
## UPPMAX

1. a [conda](https://docs.conda.io/en/latest/) environment file
2. a list of [UPPMAX](https://www.uppmax.uu.se) modules
::: {.callout-important collapse=true}

::: {.callout-note collapse=true}
#### Prerequisite: UPPMAX account

## Tools - example

Example Tools block.
To run exercises on UPPMAX you need an
account. You can apply for an account
[here](https://www.uppmax.uu.se/support/getting-started/applying-for-a-user-account/).

:::{.panel-tabset}
:::

### Listing
We will primarily be using Uppsala's high-performance computing (HPC)
center [UPPMAX](https://www.uppmax.uu.se/) to run exercises. Course
material will be hosted in a dedicated course project directory
`/proj/{{< var uppmaxproject >}}`.

- [fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
- [bwa](https://github.com/lh3/bwa) [@li_AligningSequenceReads_2013]
We recommend you setup a working directory based on your username in
`/proj/{{< var uppmaxproject >}}/users` in which to run your
exercises:

### Conda
```bash
mkdir -p /proj/{{< var uppmaxproject >}}/users/YOURUSERNAME
cd /proj/{{< var uppmaxproject >}}/users/YOURUSERNAME
```

Copy the contents to a file `environment.yml` and install packages
with `mamba env update -f environment.yml`.
All computations should be run on a compute node. You can request an
[interactive
session](https://www.uppmax.uu.se/support/faq/running-jobs-faq/how-can-i-run-interactively-on-a-compute-node/)
with the `interactive` command. For example, to request an eight hour
job on 4 cores, run

```{lang="text" }
channels:
- conda-forge
- bioconda
- defauts
dependencies:
- bwa=0.7.17
- fastqc=0.12.1
```bash
interactive -A {{< var uppmaxproject >}} --cores 4 \
--partition core --time 08:00:00
```

### UPPMAX modules
::: {.callout-important}

Execute the following command to load modules:
#### Please do not book more than 4 cores

```{bash }
#| label: uppmax-load-modules
#| echo: true
#| eval: false
module load uppmax bioinfo-tools bwa/0.7.17 FastQC/0.11.9
```
We have priviliged access to a limited number of nodes. Please do not
book more than 4 cores or else your fellow students will experience
long waiting times.

:::

::: {.callout-important}

#### Make sure to login to a compute node before running any heavy commands

:::

## gitpod
### Tutorials

UPPMAX hosts tutorials and user guides at
<https://www.uppmax.uu.se/support/user-guides/>. In particular,
<https://www.uppmax.uu.se/support/user-guides/guide--first-login-to-uppmax/>
has information on how to connect to and work on UPPMAX.

## JupyterLite

## conda
Some exercises will be run using
[JupyterLite](https://github.com/jupyterlite/jupyterlite) which is a
JupyterLab distribution that runs entirely in the browser. Apart from
having a browser, no preparations are necessary. Note that some users
have reported issues with Firefox and that Google Chrome may be a
better solution.

## Fallback solution: conda

The first fallback option is to install software packages locally on
your computer. We will use the
In case there are issues with the HPC, a fallback option is to install
software packages locally on your computer. We will use the
[conda](https://docs.conda.io/en/latest/) package manager to install
necessary requirements from the package repositories
[bioconda](https://bioconda.github.io/) and
Expand Down Expand Up @@ -145,19 +157,64 @@ or if you have packages listed in an environment file
mamba env update -f environment.yml
```

## UPPMAX
## Tools

As another fallback option, you may need an [uppmax
account](https://www.uppmax.uu.se/) to run some of the exercises. You
can apply for an account
[here](https://www.uppmax.uu.se/support/getting-started/applying-for-a-user-account/).
Computer exercise requirements are listed in `Tools` callout blocks in
each exercise. The `Tools` callout block contains listings of
programs, along with package dependencies and specifications for
UPPMAX and conda, whenever relevant. An example block is shown below.

### Tutorials
::: {.callout-note collapse=true}

Look at <https://www.uppmax.uu.se/support/user-guides/>, in particular
<https://www.uppmax.uu.se/support/user-guides/guide--first-login-to-uppmax/>
for information on how to connect to and work on uppmax.
### Tools - example

Example Tools block.

:::{.panel-tabset}

#### Listing

Provides list of packages linked to repository, and citation when
available.

## Brief introduction to bash
- [fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
- [bwa](https://github.com/lh3/bwa) [@li_AligningSequenceReads_2013]

#### UPPMAX modules

Provides command and instructions to load relevant UPPMAX modules.

Example:

```{bash }
#| label: uppmax-load-modules
#| echo: true
#| eval: false
module load uppmax bioinfo-tools bwa/0.7.17 FastQC/0.11.9
```

:::

:::

#### Conda

Provides a [conda environment
file](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)
that lists dependencies and where to retrieve them.

To install, copy the contents in the code block to a file
`environment.yml` and install packages with `mamba env update -f
environment.yml`.

```{lang="text" }
channels:
- conda-forge
- bioconda
- defauts
dependencies:
- bwa=0.7.17
- fastqc=0.12.1
```

## References
21 changes: 10 additions & 11 deletions docs/exercises/datasets/monkeyflowers.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -97,9 +97,10 @@ of natural selection (such as rapid adaptation) are responsible for
the similarities between genomic landscapes.

A locus that previously had been associated with differentiation of
red and yellow ecotypes was investigated in more detail. We will be
using this region, located on linkage group 4
(`LG4:11,000,000-14,000,000`), in the exercises.
red and yellow ecotypes was investigated in more detail. The locus is
located on linkage group 4 (`LG4`), and we will be using both a 3Mbp
region of interest (ROI) surronding the locus, and the whole linkage
group, for different exercises.

## Data

Expand Down Expand Up @@ -130,21 +131,19 @@ data[, c(1, 2, 3, 4, 7, 8, 9)] %>%
)
```

### UPPMAX webexport
### UPPMAX data storage

FIXME: Add link to datasets

Read files and reference sequences for the ROI are hosted at UPPMAX
webexport.
The monkeyflower dataset is located in UPPMAX project `{{<
var uppmaxproject >}}` at `/proj/{{< var uppmaxproject
>}}/webexport/monkeyflower`. In addition to local access, data can be
accessed remotely through [{{< var webexport.url >}}]({{< var webexport.url >}}/).

### Github

The github repository
[pgip-data](https://github.com/NBISweden/pgip-data) contains reference
sequence and read data for 37 monkeyflower individuals for the region
`LG4:12,000,000-12,100,000`. The smaller region is motivated by the
fact that even the ROI generates files that are too large to commit to
github. The data resides in the
`LG4:12,000,000-12,100,000`. The data resides in the
[data/monkeyflower/tiny](https://github.com/NBISweden/pgip-data/tree/main/data/monkeyflower/tiny)
subdirectory. This data set is used as input data to render the
website.
Expand Down
Loading
Loading