Skip to content

Commit

Permalink
change names, add some content to the intro
Browse files Browse the repository at this point in the history
  • Loading branch information
rcannood committed Sep 7, 2024
1 parent 350efbe commit 4c21d2b
Show file tree
Hide file tree
Showing 27 changed files with 66 additions and 45 deletions.
8 changes: 4 additions & 4 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ book:
repo-actions: [edit, issue, source]
chapters:
- index.qmd
- book/intro.qmd
- book/introduction.qmd
- book/usecase/index.qmd
- book/file_formats.qmd
- book/in_memory.qmd
- book/workflows/index.qmd
- book/in_memory_interoperability.qmd
- book/on_disk_interoperability.qmd
- book/workflow_frameworks/index.qmd
- book/book_slides.qmd
- book/references.qmd

Expand Down
23 changes: 0 additions & 23 deletions book/file_formats.qmd

This file was deleted.

2 changes: 1 addition & 1 deletion book/in_memory.qmd → book/in_memory_interoperability.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: In memory interoperability (from Python)
title: In-memory interoperability
engine: knitr
---

Expand Down
10 changes: 0 additions & 10 deletions book/intro.qmd

This file was deleted.

30 changes: 30 additions & 0 deletions book/introduction.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
title: Introduction
engine: knitr
---

Single-cell analysis has emerged as a transformative force in biology,
providing unprecedented insights into cellular heterogeneity and complex biological processes. The rapid advancement in this field has led to a proliferation of specialized tools and methods [@Zappia2021], often developed in different programming languages and software ecosystems. While this diversity empowers researchers to leverage the best tools for each analysis step [@Heumos2023], it also presents a significant challenge: how to seamlessly integrate and execute analyses across these disparate languages and frameworks.

The need to utilize tools from different programming ecosystems creates a "polyglot" landscape in single-cell analysis, where researchers must navigate the complexities of interoperability, data exchange, and workflow management. This fragmentation can hinder productivity, introduce errors, and impede reproducibility.

Researchers can approach this challenge in various ways, each with its own trade-offs and considerations. In the next chapters, we'll explore different strategies for achieving interoperability in single-cell analysis, including:

## Code porting

Porting tools from one language to another can offer complete control and eliminate interoperability concerns. However, one should not underestimate the effort required to reimplement complex algorithms, and the risk of introducing errors.

Furthermore, work is not done after the initial port -- in order for the researcher's work to be useful to others, the ported code must be maintained and kept up-to-date with the original implementation. For this reason, we don't consider reimplementation a viable option for most use-cases and will not discuss it further in this book.

## In-memory Interoperability

Tools like rpy2 and reticulate allow for direct communication between languages within a single analysis session. This approach provides flexibility and avoids intermediate file I/O, but can introduce complexity in managing dependencies and environments.


## File-based Interoperability

Storing intermediate results in standardized, language-agnostic file formats (e.g., HDF5, Parquet) allows for sequential execution of scripts written in different languages. This approach is relatively simple but can lead to increased storage requirements and I/O overhead.

## Workflow Frameworks

Workflow management systems (e.g., Nextflow, Snakemake) provide a structured approach to orchestrate complex, multi-language pipelines, enhancing reproducibility and automation. However, they may require a learning curve and additional configuration.
22 changes: 17 additions & 5 deletions book/anndataR.qmd → book/on_disk_interoperability.qmd
Original file line number Diff line number Diff line change
@@ -1,15 +1,27 @@
---
title: "WIP: In memory interoperability (Python side)"
title: On-disk interoperability
engine: knitr
---

Data format based interoperability

# anndataR
1. h5ad / zarr / Apache Arrow
2. Reading and writing these formats

Calling python from R and vice versa
## Setup

1. rpy2 & reticulate
2. How to do this in jupyter notebooks and rmarkdown scripts
```{python}
import anndata
import numpy
import scanpy
```

```{python}
anndata.__version__
```


## anndataR

```{r}
library(anndataR)
Expand Down
14 changes: 14 additions & 0 deletions book/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,17 @@ @article{Wratten2021
month = sep,
pages = {1161–1168}
}

@article{Zappia2021,
title = {Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape},
volume = {22},
ISSN = {1474-760X},
url = {http://dx.doi.org/10.1186/s13059-021-02519-4},
DOI = {10.1186/s13059-021-02519-4},
number = {1},
journal = {Genome Biology},
publisher = {Springer Science and Business Media LLC},
author = {Zappia, Luke and Theis, Fabian J.},
year = {2021},
month = oct
}
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ title: Workflows
author: Robrecht Cannoodt, Data Intuitive
---

Single-cell analysis has revolutionized our understanding of cellular heterogeneity and complex biological processes. However, this cutting-edge field often demands the use of multiple programming languages and frameworks, each with its strengths and specialized tools [@Heumos2023]. This polyglot approach, while powerful, introduces significant technical challenges in terms of interoperability, usability, and reproducibility.

In the previous chapters, we've explored strategies for supporting data operability across programming language. Now, we turn our attention to how to effectively integrate these tools and languages into a cohesive and scalable analysis workflow.

## Productionization
Expand Down

0 comments on commit 4c21d2b

Please sign in to comment.