Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/saeyslab/polygloty
Browse files Browse the repository at this point in the history
  • Loading branch information
berombau committed Sep 7, 2024
2 parents f861741 + ff1778d commit 679022f
Show file tree
Hide file tree
Showing 8 changed files with 275 additions and 32 deletions.
20 changes: 8 additions & 12 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,6 @@ jobs:

- name: Set up Quarto
uses: quarto-dev/quarto-actions/setup@v2

- name: Cache usecase_data
uses: actions/cache@v2
with:
path: |
book/usecase/data
key: usecase_data_${{ runner.os }}

- name: Make sure data is available
run: |
Expand All @@ -46,15 +39,18 @@ jobs:
- name: Set up renv
uses: r-lib/actions/setup-renv@v2

- name: Cache _freeze
- name: Cache certain directories to speed up build
uses: actions/cache@v2
with:
path: |
_freeze
key: renv_${{ runner.os }}
book/usecase/data
key: quarto_cache_${{ github.ref_name }}
restore-keys: |
quarto_cache_main
- name: Render slides
- name: Render book
run: |
Rscript -e "renv::restore()"
source renv/python/virtualenvs/renv-python-3.12/bin/activate
quarto render
Expand All @@ -74,4 +70,4 @@ jobs:
source-dir: _book
preview-branch: gh-pages
umbrella-dir: pr-preview
action: auto
action: auto
55 changes: 55 additions & 0 deletions .github/workflows/test-mac-arm.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
on:
pull_request:
push:
branches: main

name: Test on mac arm

jobs:
build-deploy:
runs-on: macos-latest

permissions:
contents: write

steps:
- name: Check out repository
uses: actions/checkout@v4

- name: Set up specific version of Python
uses: actions/setup-python@v5
with:
python-version: '3.12'

- name: Install R
uses: r-lib/actions/setup-r@v2

- name: Set up Quarto
uses: quarto-dev/quarto-actions/setup@v2

- name: Make sure data is available
run: |
AWS_EC2_METADATA_DISABLED=true \
aws s3 cp \
--no-sign-request \
s3://openproblems-bio/public/neurips-2023-competition/sc_counts_reannotated_with_counts.h5ad \
book/usecase/data/sc_counts_reannotated_with_counts.h5ad
# attempt with renv
- name: Set up renv
uses: r-lib/actions/setup-renv@v2

- name: Cache certain directories to speed up build
uses: actions/cache@v2
with:
path: |
book/usecase/data
key: quarto_cache_osx_${{ github.ref_name }}
restore-keys: |
quarto_cache_osx_main
- name: Render book
run: |
Rscript -e "renv::restore()"
source renv/python/virtualenvs/renv-python-3.12/bin/activate
quarto render
3 changes: 3 additions & 0 deletions book/in_memory2.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ title: In memory interoperability (from R)
engine: knitr
---

One aproach to interoperability is to work on in-memory representations of one object, and convert these in memory between different programming languages. This does not require you to write out your datasets and read them in in the different programming enivronment, but it does require you to set up an environment in both languages, which can be cumbersome.


There are multiple ways to

In this notebook, we will showcase how to call Python code from R
Expand Down
85 changes: 85 additions & 0 deletions book/in_memory_interoperability.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,91 @@ title: In-memory interoperability
engine: knitr
---

One aproach to interoperability is to work on in-memory representations of one object, and convert these in memory between different programming languages. This does not require you to write out your datasets and read them in in the different programming enivronment, but it does require you to set up an environment in both languages, which can be cumbersome.
Typically, one language will act as the main host language, and you will intereact with the other language using an FFI (foreign function interface).
When evaluating R code within a Python program, we will make use of rpy2 to accomplish this.

Rpy2 is a foreign function interface to R. It can be used in the following way:
```{python}
import rpy2
import rpy2.robjects as robjects
vector = robjects.IntVector([1,2,3])
rsum = robjects.r['sum']
rsum(vector)
```

Luckily, we're not restricted to just calling R functions and creating R objects. The real power of this in-memory interoperability lies in the conversion of Python objects to R objects to call R functions on, and then to the conversion of the results back to Python objects.

Rpy2 requires specific conversion rules for different Python objects. It is straightforward to create R vectors from corresponding Python lists:

```{python}
str_vector = robjects.StrVector(['abc', 'def', 'ghi'])
flt_vector = robjects.FloatVector([0.3, 0.8, 0.7])
int_vector = robjects.IntVector([1, 2, 3])
mtx = robjects.r.matrix(robjects.IntVector(range(10)), nrow=5)
```

However, for single cell biology, the objects that are most interesting to convert are (count) matrices, arrays and dataframes. In order to do this, you need to import the corresponding rpy2 modules and specify the conversion context.

```{python}
import numpy as np
from rpy2.robjects import numpy2ri
from rpy2.robjects import default_converter
rd_m = np.random.random((10, 7))
with (default_converter + numpy2ri.converter).context():
mtx2 = robjects.r.matrix(rd_m, nrow = 10)
```

```{python}
import pandas as pd
from rpy2.robjects import pandas2ri
pd_df = pd.DataFrame({'int_values': [1,2,3],
'str_values': ['abc', 'def', 'ghi']})
with (default_converter + pandas2ri.converter).context():
pd_df_r = robjects.DataFrame(pd_df)
```

One big limitation of rpy2 is the inability to convert sparse matrices: there is no built-in conversion module for scipy.
The `anndata2ri` package provides, apart from functionality to convert SingleCellExperiment object to an anndata object, functions to convert sparse matrices.

TODO: how to subscript sparse matrix? Is it possible?

```{python}
import scipy as sp
from anndata2ri import scipy2ri
sparse_matrix = sp.sparse.csc_matrix(rd_m)
with (default_converter + scipy2ri.converter).context():
sp_r = scipy2ri.py2rpy(sparse_matrix)
```

We will showcase how to use anndata2ri to convert an anndata object to a SingleCellExperiment object and vice versa as well:
```{python}
import anndata as ad
import scanpy.datasets as scd
import anndata2ri
adata_paul = scd.paul15()
with anndata2ri.converter.context():
sce = anndata2ri.py2rpy(adata_paul)
ad2 = anndata2ri.rpy2py(sce)
```


Besides creating R objects and calling R functions,

In this notebook, we will showcase how to call R code from Python.
We will make use of rpy2 and anndata2ri.

Expand Down
2 changes: 1 addition & 1 deletion index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@ This book is a collection of notebooks and explanations for the workshop on **Po

In order to use the best performing methods for each step of the single-cell analysis process, bioinformaticians need to use multiple ecosystems and programming languages. This is unfortunately not that straightforward. This workshop gives an overview of the different levels of interoperability, and how it is possible to integrate them in a single workflow.

To get started, read the [Introduction](book/intro.qmd) chapter.
To get started, read the [Introduction](book/introduction.qmd) chapter.

To learn more about Quarto books visit <https://quarto.org/docs/books>.
Loading

0 comments on commit 679022f

Please sign in to comment.