Merge branch 'main' of https://github.com/saeyslab/polygloty

saeyslab · Sep 7, 2024 · 679022f · 679022f
2 parents f861741 + ff1778d
commit 679022f
Show file tree

Hide file tree

Showing 8 changed files with 275 additions and 32 deletions.
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
@@ -26,13 +26,6 @@ jobs:
 
       - name: Set up Quarto
         uses: quarto-dev/quarto-actions/setup@v2
-
-      - name: Cache usecase_data
-        uses: actions/cache@v2
-        with:
-          path: |
-            book/usecase/data
-          key: usecase_data_${{ runner.os }}
 
       - name: Make sure data is available
         run: |
@@ -46,15 +39,18 @@ jobs:
       - name: Set up renv
         uses: r-lib/actions/setup-renv@v2
 
-      - name: Cache _freeze
+      - name: Cache certain directories to speed up build
         uses: actions/cache@v2
         with:
           path: |
-            _freeze
-          key: renv_${{ runner.os }}
+            book/usecase/data
+          key: quarto_cache_${{ github.ref_name }}
+          restore-keys: |
+            quarto_cache_main
       
-      - name: Render slides
+      - name: Render book
         run: |
+          Rscript -e "renv::restore()"
           source renv/python/virtualenvs/renv-python-3.12/bin/activate
           quarto render
 
@@ -74,4 +70,4 @@ jobs:
           source-dir: _book
           preview-branch: gh-pages
           umbrella-dir: pr-preview
-          action: auto
+          action: auto
diff --git a/.github/workflows/test-mac-arm.yml b/.github/workflows/test-mac-arm.yml
@@ -0,0 +1,55 @@
+on:
+  pull_request:
+  push:
+    branches: main
+
+name: Test on mac arm
+
+jobs:
+  build-deploy:
+    runs-on: macos-latest
+
+    permissions:
+      contents: write
+
+    steps:
+      - name: Check out repository
+        uses: actions/checkout@v4
+
+      - name: Set up specific version of Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+
+      - name: Install R
+        uses: r-lib/actions/setup-r@v2
+
+      - name: Set up Quarto
+        uses: quarto-dev/quarto-actions/setup@v2
+
+      - name: Make sure data is available
+        run: |
+          AWS_EC2_METADATA_DISABLED=true \
+            aws s3 cp \
+            --no-sign-request \
+            s3://openproblems-bio/public/neurips-2023-competition/sc_counts_reannotated_with_counts.h5ad \
+            book/usecase/data/sc_counts_reannotated_with_counts.h5ad
+
+      # attempt with renv
+      - name: Set up renv
+        uses: r-lib/actions/setup-renv@v2
+
+      - name: Cache certain directories to speed up build
+        uses: actions/cache@v2
+        with:
+          path: |
+            book/usecase/data
+          key: quarto_cache_osx_${{ github.ref_name }}
+          restore-keys: |
+            quarto_cache_osx_main
+      
+      - name: Render book
+        run: |
+          Rscript -e "renv::restore()"
+          source renv/python/virtualenvs/renv-python-3.12/bin/activate
+          quarto render
diff --git a/book/in_memory2.qmd b/book/in_memory2.qmd
@@ -3,6 +3,9 @@ title: In memory interoperability (from R)
 engine: knitr
 ---
 
+One aproach to interoperability is to work on in-memory representations of one object, and convert these in memory between different programming languages. This does not require you to write out your datasets and read them in in the different programming enivronment, but it does require you to set up an environment in both languages, which can be cumbersome.
+
+
 There are multiple ways to 
 
 In this notebook, we will showcase how to call Python code from R

diff --git a/book/in_memory_interoperability.qmd b/book/in_memory_interoperability.qmd
@@ -3,6 +3,91 @@ title: In-memory interoperability
 engine: knitr
 ---
 
+One aproach to interoperability is to work on in-memory representations of one object, and convert these in memory between different programming languages. This does not require you to write out your datasets and read them in in the different programming enivronment, but it does require you to set up an environment in both languages, which can be cumbersome.
+Typically, one language will act as the main host language, and you will intereact with the other language using an FFI (foreign function interface).
+When evaluating R code within a Python program, we will make use of rpy2 to accomplish this. 
+
+Rpy2 is a foreign function interface to R. It can be used in the following way:
+```{python}
+import rpy2
+import rpy2.robjects as robjects
+
+vector = robjects.IntVector([1,2,3])
+rsum = robjects.r['sum']
+
+rsum(vector)
+```
+
+Luckily, we're not restricted to just calling R functions and creating R objects. The real power of this in-memory interoperability lies in the conversion of Python objects to R objects to call R functions on, and then to the conversion of the results back to Python objects.
+
+Rpy2 requires specific conversion rules for different Python objects. It is straightforward to create R vectors from corresponding Python lists:
+
+```{python}
+str_vector = robjects.StrVector(['abc', 'def', 'ghi'])
+flt_vector = robjects.FloatVector([0.3, 0.8, 0.7])
+int_vector = robjects.IntVector([1, 2, 3])
+mtx = robjects.r.matrix(robjects.IntVector(range(10)), nrow=5)
+```
+
+However, for single cell biology, the objects that are most interesting to convert are (count) matrices, arrays and dataframes. In order to do this, you need to import the corresponding rpy2 modules and specify the conversion context.
+
+```{python}
+import numpy as np
+
+from rpy2.robjects import numpy2ri
+from rpy2.robjects import default_converter
+
+rd_m = np.random.random((10, 7))
+
+with (default_converter + numpy2ri.converter).context():
+    mtx2 = robjects.r.matrix(rd_m, nrow = 10)
+```
+
+```{python}
+import pandas as pd
+
+from rpy2.robjects import pandas2ri
+
+pd_df = pd.DataFrame({'int_values': [1,2,3],
+                      'str_values': ['abc', 'def', 'ghi']})
+
+with (default_converter + pandas2ri.converter).context():
+    pd_df_r = robjects.DataFrame(pd_df)
+```
+
+One big limitation of rpy2 is the inability to convert sparse matrices: there is no built-in conversion module for scipy.
+The `anndata2ri` package provides, apart from functionality to convert SingleCellExperiment object to an anndata object, functions to convert sparse matrices.
+
+TODO: how to subscript sparse matrix? Is it possible?
+
+```{python}
+import scipy as sp
+
+from anndata2ri import scipy2ri
+
+sparse_matrix = sp.sparse.csc_matrix(rd_m)
+
+with (default_converter + scipy2ri.converter).context():
+    sp_r = scipy2ri.py2rpy(sparse_matrix)
+```
+
+We will showcase how to use anndata2ri to convert an anndata object to a SingleCellExperiment object and vice versa as well:
+```{python}
+import anndata as ad
+import scanpy.datasets as scd
+
+import anndata2ri
+
+adata_paul = scd.paul15()
+
+with anndata2ri.converter.context():
+    sce = anndata2ri.py2rpy(adata_paul)
+    ad2 = anndata2ri.rpy2py(sce)
+```
+
+
+Besides creating R objects and calling R functions, 
+
 In this notebook, we will showcase how to call R code from Python.
 We will make use of rpy2 and anndata2ri.
 

diff --git a/index.qmd b/index.qmd
@@ -4,6 +4,6 @@ This book is a collection of notebooks and explanations for the workshop on **Po
 
 In order to use the best performing methods for each step of the single-cell analysis process, bioinformaticians need to use multiple ecosystems and programming languages. This is unfortunately not that straightforward. This workshop gives an overview of the different levels of interoperability, and how it is possible to integrate them in a single workflow.
 
-To get started, read the [Introduction](book/intro.qmd) chapter.
+To get started, read the [Introduction](book/introduction.qmd) chapter.
 
 To learn more about Quarto books visit <https://quarto.org/docs/books>.