Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Git, Conda and Snakemake tutorials #272

Merged
merged 34 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
1a74fe1
Minor text change
johnne Nov 5, 2024
958038e
Minor change to wording
johnne Nov 6, 2024
9760e01
Minor word change
johnne Nov 6, 2024
d857eb1
Update conda instructions for new macs
johnne Nov 8, 2024
68a9334
Merge branch 'main' into js
johnne Nov 8, 2024
e7f8742
Minor rewording
johnne Nov 8, 2024
e4b1a22
Fix misspelling
johnne Nov 8, 2024
6f397dd
Merge branch 'main' into js
johnne Nov 12, 2024
aa8b9ac
Fix callout
johnne Nov 12, 2024
a2ca8ba
Remove duplicate wording
johnne Nov 12, 2024
abee898
Add details about conda channels in basics section
johnne Nov 13, 2024
d516d5a
Fix link
johnne Nov 13, 2024
4c48bdc
Minor update to instructions
johnne Nov 13, 2024
98f7ec7
Add callout about M1 python
johnne Nov 13, 2024
beba69e
Remove deprecated flags
johnne Nov 13, 2024
73c161d
Move callout for snakemake env
johnne Nov 13, 2024
f25d250
Update dry run output
johnne Nov 13, 2024
138e4dc
Start rewrite of basics section
johnne Nov 14, 2024
fa52043
Merge branch 'main' into js
johnne Nov 15, 2024
6bcbf1e
Change snakemake command line
johnne Nov 15, 2024
fbe91e5
Update visualization section
johnne Nov 15, 2024
47e4ac9
Fix invalid escape sequence
johnne Nov 15, 2024
777abfe
Revert "Fix invalid escape sequence"
johnne Nov 15, 2024
1edfc86
Fix invalid escape sequence
johnne Nov 15, 2024
38b908e
Update up to Generalising workflows section
johnne Nov 16, 2024
ec7cf93
Add filename to code blocks
johnne Nov 16, 2024
797e0da
Remove rulegraph rule
johnne Nov 17, 2024
f0afd64
Update starting point for MRSA workflow
johnne Nov 18, 2024
e21277d
Streamline MRSA generalization
johnne Nov 19, 2024
5dba2e3
Rewrite snakemake tutorial
johnne Nov 19, 2024
d80cf55
Update finished Snakemake workflow
johnne Nov 19, 2024
415c970
Set smaller fontsize for code
johnne Nov 19, 2024
7c9c62f
Merge branch 'main' into js
johnne Nov 19, 2024
cbe1e23
Set min Snakemake version 8
johnne Nov 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions assets/css/styles.scss
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,10 @@ $footer-fg: DimGray;

/* code */

pre, code, .sourceCode {
font-size: 0.75rem;
}

pre {
line-height: 1.4;
background-color: $code-block-bg;
Expand Down
38 changes: 24 additions & 14 deletions home_precourse.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -284,38 +284,48 @@ into some problems with certain Conda packages that have not yet been built for
the ARM64 architecture. The [Rosetta](https://support.apple.com/en-us/HT211861)
software allows ARM64 Macs to use software built for the old AMD64 architecture,
which means you can always fall back on creating AMD/Intel-based environments
and use them in conjunction with Rosetta. This is how you do it:
and use them in conjunction with Rosetta. This can be done by specifying
`--subdir osx-64` when creating the environment, _e.g._:

```bash
conda env create -f <path-To-Environment.yml> --subdir osx-64
```

or

```bash
conda create -n myenv <packages...> --subdir osx-64
```

::: {.callout-important}
To make sure that subsequent installations into this environment also use the
`osx-64` architecture, activate the environment and then run:

```bash
CONDA_SUBDIR=osx-64 <conda-command>
conda activate <env>
conda config --env --set subdir osx-64
```
:::

The first command creates the Intel-based environment, while the last one
makes sure that subsequent commands are also using the Intel architecture. If
you don't want to remember and do this manually each time you want to use
AMD64/Rosetta you can check out [this bash script](https://github.com/fasterius/dotfiles/blob/main/scripts/intel-conda-env.sh).

## Installing Snakemake

We will use Conda environments for the set up of this tutorial, but don't worry
if you don't understand exactly what everything does - you'll learn all the
details at the course. First make sure you're currently situated inside the
tutorials directory (`workshop-reproducible-research/tutorials`) and then create
the Conda environment like so:
and activate the Conda environment with the commands below:

::: {.callout-caution title="ARM64 users"}
Some of the packages in this environment are not available for the ARM64
architecture, so you'll have to add `--subdir osx-64` to the `conda env create`
command. See the [instructions above](#conda-on-new-macs) for more details.
:::

```bash
conda env create -f snakemake/environment.yml -n snakemake-env
conda activate snakemake-env
```

::: {.callout-caution title="ARM64 users"}
Some of the packages in this environment is not available for the ARM64
architecture, so you'll have to follow the [instructions
above](#conda-on-new-macs).
:::

Check that Snakemake is installed correctly, for example by executing `snakemake
--help`. This should output a list of available Snakemake settings. If you get
`bash: snakemake: command not found` then you need to go back and ensure that
Expand Down
86 changes: 64 additions & 22 deletions pages/conda.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,7 @@ A Conda _environment_ is essentially a directory that is added to your PATH and
that contains a specific collection of packages that you have installed.
Packages are symlinked between environments to avoid unnecessary duplication.

::: {.callout-}
**Different Conda flavours**
::: {.callout-note title="Different Conda flavours"}
You may come across several flavours of Conda. There's _Miniconda_, which is
the installer for Conda. The second is _Anaconda_, which is a distribution of
not only Conda, but also over 150 scientific Python packages curated by the
Expand All @@ -52,14 +51,15 @@ not even use. Then, lastly, there's the _Miniforge_ flavour that we're using
here, which is a community-driven version of Conda that's highly popular
within the scientific community.

The difference between Miniconda and Miniforge is that the former points to
points to the `default` channel by default (which requires an Anaconda license
for commercial purposes), while the latter points to the community-maintained
`conda-forge` channel by default. While Conda is created and owned by Anaconda
the company, Conda itself is open source - it's the `default` channel that is
proprietary. The `conda-forge` and `bioconda` channels (two of the largest
channels outside of `default`) are community-driven. Confusing? Yes. If you
want this information more in-depth you can read this [blog post by Anaconda](https://www.anaconda.com/blog/is-conda-free).
The difference between Miniconda and Miniforge is that the former points to the
`default` channel by default (which requires an Anaconda license for commercial
purposes), while the latter points to the community-maintained `conda-forge`
channel by default. While Conda is created and owned by Anaconda the company,
Conda itself is open source - it's the `default` channel that is proprietary.
The `conda-forge` and `bioconda` channels (two of the largest channels outside
of `default`) are community-driven. Confusing? Yes. If you want this information
more in-depth you can read this [blog post by
Anaconda](https://www.anaconda.com/blog/is-conda-free).
:::

## The basics
Expand All @@ -80,12 +80,43 @@ called _Project A_.
- Let's make our first Conda environment:

```bash
conda create -n project_a -c bioconda fastqc
conda create -n project_a fastqc
```

This will create an environment called `project_a`, containing FastQC from the
Bioconda channel. Conda will list the packages that will be installed and ask
for your confirmation.
This will create an environment called `project_a`, containing FastQC.

You should see something like this printed to the terminal:

```
Channels:
- conda-forge
- bioconda
- defaults
Platform: osx-arm64 # <- Your platform may differ
```

This is Conda telling you which channels it is looking in for the package you
requested. If you followed the
[pre-course](../home_precourse.html#configuring-conda) instructions and added
the `conda-forge` and `bioconda` channels to your conda configuration, you
should see them listed here. If you had not configured Conda to use these
channels, you would have to specify them when installing packages, _e.g._ `conda
install -c bioconda fastqc`.

Further down you will see something like this:

```
The following NEW packages will be INSTALLED:

fastqc bioconda/noarch::fastqc-0.12.1-hdfd78af_0
<more packages>
```

This shows that the `fastqc` package will be installed from the `bioconda`
channel (the `noarch` part shows that the package is not specific to any
computer architecture). In the example above we see that version `0.12.1` will
be installed. The `hdfd78af_0` part is a unique build string that is used to
differentiate between different builds of the same package version.

- Once it is done, you can activate the environment:

Expand Down Expand Up @@ -142,21 +173,21 @@ activated.
install`. Make sure that `project_a` is the active environment first.

```bash
conda install -c bioconda multiqc
conda install multiqc
```

- If we don't specify the package version, the latest available version will be
installed. What version of MultiQC got installed?
- Run the following to see what versions are available:

```bash
conda search -c bioconda multiqc
conda search multiqc
```

- Now try to install a different version of MultiQC, _e.g._:

```bash
conda install -c bioconda multiqc=1.13
conda install multiqc=1.13
```

Read the information that Conda displays in the terminal. It probably asks if
Expand Down Expand Up @@ -196,7 +227,7 @@ conda create -n project_old -c bioconda bbmap=37.10

Now let's try to remove an installed package from the active environment:

```
```bash
conda remove multiqc
```

Expand Down Expand Up @@ -290,9 +321,9 @@ Conda installation path.

- Activate the environment!

- Now we can run the code for the MRSA project found in `code/run_qc.sh`,
either by running `bash code/run_qc.sh` or by opening the `run_qc.sh` file
and executing each line in the terminal one by one. Do this!
- Now we can run the code for the MRSA project found in `code/run_qc.sh`
by running `bash code/run_qc.sh`. Do this! (You can also open the file
and run the commands manually if you prefer.)

This should download the project FASTQ files and run FastQC on them (as
mentioned above).
Expand Down Expand Up @@ -469,7 +500,18 @@ directory of the currently active environment.

When you create a new Conda environment you can choose to install a specific
version of Python in that environment as well. As an example, create an
environment containing Python version `3.5` by running:
environment containing Python version `3.5`:

::: {.callout-note}

Python versions older than `v3.8.5` are not available for Macs with the M-series
chips, so if you are using one of those you will need to add `--subdir osx-64`
to the command, _e.g._:

```bash
conda create --subdir osx-64 -n py35 python=3.5
```
:::

```bash
conda create -n py35 python=3.5
Expand Down
24 changes: 13 additions & 11 deletions pages/git.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ There are many benefits of using Git in your research project:
to collaborate by tracking all edits made by each person. It will also handle
any potential conflicting edits.
- Using a cloud-based repository hosting service (the one you push your commits
to), like _e.g._ [GitHub](https://github.com/) or
to), like [GitHub](https://github.com/) or
[Bitbucket](https://bitbucket.org/), adds additional features, such as being
able to discuss the project, comment on edits, or report issues.
- If at some point your project will be published GitHub or Bitbucket (or
Expand Down Expand Up @@ -515,10 +515,10 @@ displays the difference on a word-per-word basis rather than line-per-line.

::: {.callout-note}
Git is constantly evolving, along with some of its commands. The `checkout`
command was previously used for switching between branches, but this
functionality now has the dedicated (and clearer) `switch` command for this.
If you've previously learned using `checkout` instead you can keep doing that
without any issues, as the `checkout` command itself hasn't changed.
command was previously used for switching between branches, but now there's the
dedicated (and clearer) `switch` command for this functionality. If you've
previously learned using `checkout` instead you can keep doing that without any
issues, as the `checkout` command itself hasn't changed.
:::

Now, let's assume that we have tested our code and the alignment analysis is run
Expand All @@ -539,7 +539,7 @@ git merge test_alignment
```

Run `git log --graph --all --oneline` again to see how the merge commit brings
back the changes made in `test_alignment` to `main`.
the changes made in `test_alignment` into `main`.

::: {.callout-tip}
If working on different features or parts of an analysis on different
Expand Down Expand Up @@ -718,9 +718,11 @@ git remote add origin [email protected]:user/git_tutorial.git
your local Git clone. The short name of the default remote is usually
"_origin_" by convention.

::: {.callout-note}
::: {.callout-note}
Make sure you've used an SSH address (_i.e._ starting with `[email protected]`
rather than an HTTPS address (starting with `https://github.com`)!
rather than an HTTPS address (starting with `https://github.com`)! Also make
sure you've set up ssh-keys as described in the
[github-setup](../home_precourse.html#github-setup) in the pre-course material.
:::

- We have not yet synced the local and remote repositories, though, we've simply
Expand Down Expand Up @@ -896,8 +898,8 @@ a shorthand for `git fetch` followed by `git merge FETCH_HEAD` (where

That's quite a few concepts and commands you've just learnt! It can be a bit
hard to keep track of everything and the connections between local and remote
Git repositories and how you work with them, but hopefully the following figure
will give you a short visual summary:
Git repositories and how you work with them. The following figure will give you
a short visual summary:

![](images/git_overview_remote.png){ width=600px }

Expand All @@ -915,7 +917,7 @@ repositories and how to sync them:

### Remote branches

Remote branches work much in the same way a local branches, but you have to
Remote branches work much in the same way as local branches, but you have to
push them separately; you might have noticed that GitHub only listed our
repository as having one branch (you can see this by going to the _Code_ tab).
This is because we only pushed our `main` branch to the remote. Let's create
Expand Down
Loading
Loading