Migrate components from main repository (#1)

* Update common submodule * Add ORCiD to task config * Migrate metrics/clustering_performance component * Remove template metrics/accuracy component * Style YAML files * Style Python files Using Black and isort * Migrate methods/pca component * Remove template methods/logistic_regression comp * Move links in metrics for consistency with methods * Migrate control_methods/spectral_features comp * Tidy component config YAML files Add comments separating sections * Fix template name in control_methods/spectral_features * Migrate control_methods/true_features component * Migrate metrics/coranking component * Migrate metrics/density_preservation component * Migrate metrics/distance_correlation component * Migrate metrics/trustworthiness component * Migrate methods/densmap component * Migrate methods/diffusion_map component * Migrate methods/ivis component * Migrate methods/lmds component * Update data_processor component Match template and perform correct processing for this task * Set diffusion_map default to 2 dimensions * Migrate methods/neuralee component * Migrate methods/phate component * Migrate methods/pymde component * Migrate methods/simlr component * Migrate methods/tsne component * Migrate methods/umap component * Render README * Update CHANGELOG.md * update to viash 0.9.0 * update format of components * fix lmds --------- Co-authored-by: Robrecht Cannoodt <[email protected]> Co-authored-by: bendemeo <[email protected]> Co-authored-by: michalk8 <[email protected]> Co-authored-by: scottgigante <[email protected]> Co-authored-by: Kai Waldrant <[email protected]> Co-authored-by: sainirmayi <[email protected]> Co-authored-by: jacorvar <[email protected]>
openproblems-bio · Sep 7, 2024 · ab268cc · ab268cc
1 parent 3263edd
commit ab268cc
Show file tree

Hide file tree

Showing 54 changed files with 2,281 additions and 355 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,26 +1,28 @@
-# task_template x.y.z
+<!-- # dimensionality_reduction x.y.z
 
 ## BREAKING CHANGES
 
-<!-- * Restructured `src` directory (PR #3). -->
+* Restructured `src` directory (PR #3).
 
 ## NEW FUNCTIONALITY
 
 * Added `control_methods/true_labels` component (PR #5).
-
 * Added `methods/logistic_regression` component (PR #5).
-
 * Added `metrics/accuracy` component (PR #5).
 
 ## MAJOR CHANGES
 
 * Updated `api` files (PR #5).
-
 * Updated configs, components and CI to the latest Viash version (PR #8).
 
 ## MINOR CHANGES
 
 * Updated `README.md` (PR #5).
 
-## BUGFIXES
+## BUGFIXES -->
+
+# dimensionality_reduction 0.1.0 2024-09-05
+
+## NEW FUNCTIONALITY
 
+* Migrated components from the main Open Problems repository (PR #1)
diff --git a/README.md b/README.md
@@ -93,8 +93,40 @@ flowchart LR
 
 The dataset to pass to a method.
 
-Example file:
-`resources_test/dimensionality_reduction/pancreas/dataset.h5ad`
+Example file: `resources_test/common/pancreas/dataset.h5ad`
+
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'cell_type'
+     var: 'hvg_score'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'
+
+</div>
+
+Data structure:
+
+<div class="small">
+
+| Slot | Type | Description |
+|:---|:---|:---|
+| `obs["cell_type"]` | `string` | Classification of the cell type based on its characteristics and function within the tissue or organism. |
+| `var["hvg_score"]` | `double` | High variability gene score (normalized dispersion). The greater, the more variable. |
+| `layers["counts"]` | `integer` | Raw counts. |
+| `layers["normalized"]` | `double` | Normalized expression values. |
+| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
+| `uns["dataset_name"]` | `string` | Nicely formatted name. |
+| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
+| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
+| `uns["dataset_description"]` | `string` | Long description of the dataset. |
+| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
+| `uns["normalization_id"]` | `string` | Which normalization was used. |
+
+</div>
 
 ## Component type: Data processor
 
@@ -119,13 +151,71 @@ The dataset to pass to a method.
 Example file:
 `resources_test/dimensionality_reduction/pancreas/dataset.h5ad`
 
+Format:
+
+<div class="small">
+
+    AnnData object
+     var: 'hvg_score'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'normalization_id'
+
+</div>
+
+Data structure:
+
+<div class="small">
+
+| Slot | Type | Description |
+|:---|:---|:---|
+| `var["hvg_score"]` | `double` | High variability gene score (normalized dispersion). The greater, the more variable. |
+| `layers["counts"]` | `integer` | Raw counts. |
+| `layers["normalized"]` | `double` | Normalized expression values. |
+| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
+| `uns["normalization_id"]` | `string` | Which normalization was used. |
+
+</div>
+
 ## File format: Test data
 
 The data for evaluating a dimensionality reduction.
 
 Example file:
 `resources_test/dimensionality_reduction/pancreas/solution.h5ad`
 
+Format:
+
+<div class="small">
+
+    AnnData object
+     obs: 'cell_type'
+     var: 'hvg_score'
+     layers: 'counts', 'normalized'
+     uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'
+
+</div>
+
+Data structure:
+
+<div class="small">
+
+| Slot | Type | Description |
+|:---|:---|:---|
+| `obs["cell_type"]` | `string` | Classification of the cell type based on its characteristics and function within the tissue or organism. |
+| `var["hvg_score"]` | `double` | High variability gene score (normalized dispersion). The greater, the more variable. |
+| `layers["counts"]` | `integer` | Raw counts. |
+| `layers["normalized"]` | `double` | Normalized expression values. |
+| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
+| `uns["dataset_name"]` | `string` | Nicely formatted name. |
+| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
+| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
+| `uns["dataset_description"]` | `string` | Long description of the dataset. |
+| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
+| `uns["normalization_id"]` | `string` | Which normalization was used. |
+
+</div>
+
 ## Component type: Control method
 
 Quality control methods for verifying the pipeline.
@@ -180,10 +270,56 @@ A dataset with dimensionality reduction embedding.
 Example file:
 `resources_test/dimensionality_reduction/pancreas/embedding.h5ad`
 
+Format:
+
+<div class="small">
+
+    AnnData object
+     obsm: 'X_emb'
+     uns: 'dataset_id', 'method_id', 'normalization_id'
+
+</div>
+
+Data structure:
+
+<div class="small">
+
+| Slot                      | Type     | Description                          |
+|:--------------------------|:---------|:-------------------------------------|
+| `obsm["X_emb"]`           | `double` | The dimensionally reduced embedding. |
+| `uns["dataset_id"]`       | `string` | A unique identifier for the dataset. |
+| `uns["method_id"]`        | `string` | A unique identifier for the method.  |
+| `uns["normalization_id"]` | `string` | Which normalization was used.        |
+
+</div>
+
 ## File format: Score
 
 Metric score file
 
 Example file:
 `resources_test/dimensionality_reduction/pancreas/score.h5ad`
 
+Format:
+
+<div class="small">
+
+    AnnData object
+     uns: 'dataset_id', 'normalization_id', 'method_id', 'metric_ids', 'metric_values'
+
+</div>
+
+Data structure:
+
+<div class="small">
+
+| Slot | Type | Description |
+|:---|:---|:---|
+| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
+| `uns["normalization_id"]` | `string` | Which normalization was used. |
+| `uns["method_id"]` | `string` | A unique identifier for the method. |
+| `uns["metric_ids"]` | `string` | One or more unique metric identifiers. |
+| `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
+
+</div>
+
diff --git a/_viash.yaml b/_viash.yaml
@@ -1,4 +1,4 @@
-viash_version: 0.9.0-RC7
+viash_version: 0.9.0
 
 # Step 1: Change the name of the task.
 # example: task_name_of_this_task
@@ -34,7 +34,7 @@ description: |
   of continuous trajectories. Despite almost every single-cell study using one of these visualisations
   there has been debate as to whether they can effectively capture the variation in single-cell
   datasets [@chari2023speciousart].
-  
+
   The dimensionality reduction task attempts to quantify the ability of methods to embed the
   information present in complex single-cell studies into a two-dimensional space. Thus, this task
   is specifically designed for dimensionality reduction for visualisation and does not consider other
@@ -61,7 +61,7 @@ description: |
 #         publisher = {Research Square},
 #         year = {2021},
 #       }
-  
+
 info:
   image: thumbnail.svg
   # Step 5: Replace the task_template to the name of the task.
@@ -74,41 +74,42 @@ info:
       dest: resources_test/dimensionality_reduction
 
 # Step 6: Update the authors of the task.
-authors: 
+authors:
   - name: Luke Zappia
-    roles: [ maintainer, author ]
+    roles: [maintainer, author]
     info:
       github: lazappi
+      orcid: 0000-0001-7744-8565
   - name: Michal Klein
-    roles: [ author ]
+    roles: [author]
     info:
       github: michalk8
   - name: Scott Gigante
-    roles: [ author ]
+    roles: [author]
     info:
       github: scottgigante
       orcid: "0000-0002-4544-2764"
   - name: Ben DeMeo
-    roles: [ author ]
+    roles: [author]
     info:
       github: bendemeo
   - name: Robrecht Cannoodt
-    roles: [ author ]
+    roles: [author]
     info:
       github: rcannood
       orcid: 0000-0003-3641-729X
   - name: Kai Waldrant
-    roles: [ contributor ]
+    roles: [contributor]
     info:
       github: KaiWaldrant
       orcid: 0009-0003-8555-1361
   - name: Sai Nirmayi Yasa
-    roles: [ contributor ]
+    roles: [contributor]
     info:
       github: sainirmayi
       orcid: 0009-0003-6319-9803
   - name: Juan A. Cordero Varela
-    roles: [ contributor ]
+    roles: [contributor]
     info:
       github: jacorvar
       orcid: 0000-0002-7373-5433
@@ -123,4 +124,4 @@ repositories:
   - name: openproblems-v2
     type: github
     repo: openproblems-bio/openproblems-v2
-    tag: main_build
+    tag: main_build
diff --git a/common b/common
diff --git a/src/api/comp_process_dataset.yaml b/src/api/comp_process_dataset.yaml
@@ -20,7 +20,7 @@ arguments:
     direction: output
     required: true
 test_resources:
-  - path: /resources_test/dimensionality_reduction/pancreas/
-    dest: resources_test/dimensionality_reduction/pancreas/
+  - path: /resources_test/common/pancreas/
+    dest: resources_test/common/pancreas/
   - type: python_script
     path: /common/component_tests/run_and_check_output.py
diff --git a/src/api/file_common_dataset.yaml b/src/api/file_common_dataset.yaml
@@ -1,11 +1,11 @@
 type: file
-example: "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
+example: "resources_test/common/pancreas/dataset.h5ad"
 label: "Dataset"
 summary: "The dataset to pass to a method."
 info:
   format:
     type: h5ad
-    layers: 
+    layers:
       - type: integer
         name: counts
         description: Raw counts
@@ -14,7 +14,7 @@ info:
         name: normalized
         description: Normalized expression values
         required: true
-    obs: 
+    obs:
       - type: string
         name: cell_type
         description: Classification of the cell type based on its characteristics and function within the tissue or organism.

diff --git a/src/api/file_dataset.yaml b/src/api/file_dataset.yaml
@@ -5,7 +5,7 @@ summary: "The dataset to pass to a method."
 info:
   format:
     type: h5ad
-    layers: 
+    layers:
       - type: integer
         name: counts
         description: Raw counts

diff --git a/src/api/file_embedding.yaml b/src/api/file_embedding.yaml
@@ -23,4 +23,3 @@ info:
         name: normalization_id
         description: "Which normalization was used"
         required: true
-
diff --git a/src/api/file_score.yaml b/src/api/file_score.yaml
@@ -22,9 +22,9 @@ info:
         name: metric_ids
         description: "One or more unique metric identifiers"
         multiple: true
-        required: true        
+        required: true
       - type: double
         name: metric_values
         description: "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'."
         multiple: true
-        required: true
+        required: true
diff --git a/src/api/file_solution.yaml b/src/api/file_solution.yaml
@@ -5,7 +5,7 @@ summary: "The data for evaluating a dimensionality reduction."
 info:
   format:
     type: h5ad
-    layers: 
+    layers:
       - type: integer
         name: counts
         description: Raw counts
@@ -14,7 +14,7 @@ info:
         name: normalized
         description: Normalized expression values
         required: true
-    obs: 
+    obs:
       - type: string
         name: cell_type
         description: Classification of the cell type based on its characteristics and function within the tissue or organism.
+23 −32		component_tests/check_config.py
+26 −21		component_tests/run_and_check_output.py
Original file line number	Diff line number	Diff line change
Expand Up		@@ -23,4 +23,3 @@ info:
		name: normalization_id
		description: "Which normalization was used"
		required: true