From 0a35b98ae493dd197b36770fe13aa360ff9c3101 Mon Sep 17 00:00:00 2001
From: Benjamin Rombaut <benjamin.rombaut@gmail.com>
Date: Thu, 12 Sep 2024 10:33:29 +0200
Subject: [PATCH] add slides

---
 book/disk_based/disk_based_pipelines.qmd |  2 +-
 slides/slides.qmd                        | 95 ++++++++++++++++++++++++
 2 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/book/disk_based/disk_based_pipelines.qmd b/book/disk_based/disk_based_pipelines.qmd
index 9ae160a..fede57f 100644
--- a/book/disk_based/disk_based_pipelines.qmd
+++ b/book/disk_based/disk_based_pipelines.qmd
@@ -197,4 +197,4 @@ docker run -it -v $(pwd)/usecase:/app/usecase -v $(pwd)/book:/app/book berombau/
 
 Another approach is to use **multi-package containers**. Tools like [Multi-Package BioContainers](https://midnighter.github.io/mulled/) and [Seqera Containers](https://seqera.io/containers/) can make this quick and easy, by allowing for custom combinations of packages.
 
-You can go a long way with a folder of notebooks or scripts and the right tools. But as your project grows more bespoke, it can be worth the effort to use a **[workflow framework](../workflow_frameworks)** like Nextflow or Snakemake to manage the pipeline for you.
+You can go a long way with a folder of notebooks or scripts and the right tools. But as your project grows more bespoke, it can be worth the effort to use a **[workflow framework](../workflow_frameworks)** like Viash, Nextflow or Snakemake to manage the pipeline for you.
diff --git a/slides/slides.qmd b/slides/slides.qmd
index b9982d9..6b31b39 100644
--- a/slides/slides.qmd
+++ b/slides/slides.qmd
@@ -340,6 +340,24 @@ adata
 
 # Disk-based interoperability
 
+Disk-based interoperability is a strategy for achieving interoperability between tools written in different programming languages by **storing intermediate results in standardized, language-agnostic file formats**.
+
+Upside:
+- Simple, just add reading and witing lines
+- Modular scripts
+
+Downside:
+- increased disk usage
+- less direct interaction, debugging...
+
+# Important features of interoperable file formats
+
+- Compression
+- Sparse matrix support
+- Large images
+- Lazy chunk loading
+- Remote storage
+
 ## General single cell file formats of interest for Python and R
 
 {{< include ../book/disk_based/_general_file_formats.qmd >}}
@@ -348,6 +366,83 @@ adata
 
 {{< include ../book/disk_based/_specialized_file_formats.qmd >}}
 
+# Disk-based pipelines
+
+Script pipeline:
+```bash
+#!/bin/bash
+
+bash scripts/1_load_data.sh
+python scripts/2_compute_pseudobulk.py
+Rscript scripts/3_analysis_de.R
+```
+
+Notebook pipeline:
+```bash
+# Every step can be a new notebook execution with inspectable output
+jupyter nbconvert --to notebook --execute my_notebook.ipynb --allow-errors --output-dir outputs/
+```
+
+## Just stay in your language and call scripts
+```python
+import subprocess
+
+subprocess.run("bash scripts/1_load_data.sh", shell=True)
+# Alternatively you can run Python code here instead of calling a Python script
+subprocess.run("python scripts/2_compute_pseudobulk.py", shell=True)
+subprocess.run("Rscript scripts/3_analysis_de.R", shell=True)
+```
+
+# Pipelines with different environments
+
+1. interleave with environment (de)activation functions
+2. use rvenv
+3. use Pixi
+
+## Pixi to manage different environments
+
+```bash
+pixi run -e bash scripts/1_load_data.sh
+pixi run -e scverse scripts/2_compute_pseudobulk.py
+pixi run -e rverse scripts/3_analysis_de.R
+```
+
+## Define tasks in Pixi
+
+```bash
+...
+[feature.bash.tasks]
+load_data = "bash book/disk_based/scripts/1_load_data.sh"
+...
+[feature.scverse.tasks]
+compute_pseudobulk = "python book/disk_based/scripts/2_compute_pseudobulk.py"
+...
+[feature.rverse.tasks]
+analysis_de = "Rscript --no-init-file book/disk_based/scripts/3_analysis_de.R"
+...
+[tasks]
+pipeline = { depends-on = ["load_data", "compute_pseudobulk", "analysis_de"] }
+```
+```bash
+pixi run pipeline
+```
+
+## Also possible to use containers
+
+```bash
+docker pull berombau/polygloty-docker:latest
+docker run -it -v $(pwd)/usecase:/app/usecase -v $(pwd)/book:/app/book berombau/polygloty-docker:latest pixi run pipeline
+```
+
+Another approach is to use multi-package containers to create custom combinations of packages.
+- [Multi-Package BioContainers](https://midnighter.github.io/mulled/)
+- [Seqera Containers](https://seqera.io/containers/)
+
+
 # Workflows
 
+You can go a long way with a folder of notebooks or scripts and the right tools. But as your project grows more bespoke, it can be worth the effort to use a **[workflow framework](../workflow_frameworks)** like Viash, Nextflow or Snakemake to manage the pipeline for you.
+
+See https://saeyslab.github.io/polygloty/book/workflow_frameworks/
+
 # Takeaways