Skip to content

Commit

Permalink
Migrate components from main repository (#1)
Browse files Browse the repository at this point in the history
* Update common submodule

* Add ORCiD to task config

* Migrate metrics/clustering_performance component

* Remove template metrics/accuracy component

* Style YAML files

* Style Python files

Using Black and isort

* Migrate methods/pca component

* Remove template methods/logistic_regression comp

* Move links in metrics for consistency with methods

* Migrate control_methods/spectral_features comp

* Tidy component config YAML files

Add comments separating sections

* Fix template name in control_methods/spectral_features

* Migrate control_methods/true_features component

* Migrate metrics/coranking component

* Migrate metrics/density_preservation component

* Migrate metrics/distance_correlation component

* Migrate metrics/trustworthiness component

* Migrate methods/densmap component

* Migrate methods/diffusion_map component

* Migrate methods/ivis component

* Migrate methods/lmds component

* Update data_processor component

Match template and perform correct processing for this task

* Set diffusion_map default to 2 dimensions

* Migrate methods/neuralee component

* Migrate methods/phate component

* Migrate methods/pymde component

* Migrate methods/simlr component

* Migrate methods/tsne component

* Migrate methods/umap component

* Render README

* Update CHANGELOG.md

* update to viash 0.9.0

* update format of components

* fix lmds

---------

Co-authored-by: Robrecht Cannoodt <[email protected]>
Co-authored-by: bendemeo <[email protected]>
Co-authored-by: michalk8 <[email protected]>
Co-authored-by: scottgigante <[email protected]>
Co-authored-by: Kai Waldrant <[email protected]>
Co-authored-by: sainirmayi <[email protected]>
Co-authored-by: jacorvar <[email protected]>
  • Loading branch information
8 people authored Sep 7, 2024
1 parent 3263edd commit ab268cc
Show file tree
Hide file tree
Showing 54 changed files with 2,281 additions and 355 deletions.
14 changes: 8 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,28 @@
# task_template x.y.z
<!-- # dimensionality_reduction x.y.z
## BREAKING CHANGES
<!-- * Restructured `src` directory (PR #3). -->
* Restructured `src` directory (PR #3).
## NEW FUNCTIONALITY
* Added `control_methods/true_labels` component (PR #5).

* Added `methods/logistic_regression` component (PR #5).

* Added `metrics/accuracy` component (PR #5).
## MAJOR CHANGES
* Updated `api` files (PR #5).

* Updated configs, components and CI to the latest Viash version (PR #8).
## MINOR CHANGES
* Updated `README.md` (PR #5).
## BUGFIXES
## BUGFIXES -->

# dimensionality_reduction 0.1.0 2024-09-05

## NEW FUNCTIONALITY

* Migrated components from the main Open Problems repository (PR #1)
140 changes: 138 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,40 @@ flowchart LR

The dataset to pass to a method.

Example file:
`resources_test/dimensionality_reduction/pancreas/dataset.h5ad`
Example file: `resources_test/common/pancreas/dataset.h5ad`

Format:

<div class="small">

AnnData object
obs: 'cell_type'
var: 'hvg_score'
layers: 'counts', 'normalized'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'

</div>

Data structure:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `obs["cell_type"]` | `string` | Classification of the cell type based on its characteristics and function within the tissue or organism. |
| `var["hvg_score"]` | `double` | High variability gene score (normalized dispersion). The greater, the more variable. |
| `layers["counts"]` | `integer` | Raw counts. |
| `layers["normalized"]` | `double` | Normalized expression values. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["dataset_name"]` | `string` | Nicely formatted name. |
| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
| `uns["dataset_description"]` | `string` | Long description of the dataset. |
| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
| `uns["normalization_id"]` | `string` | Which normalization was used. |

</div>

## Component type: Data processor

Expand All @@ -119,13 +151,71 @@ The dataset to pass to a method.
Example file:
`resources_test/dimensionality_reduction/pancreas/dataset.h5ad`

Format:

<div class="small">

AnnData object
var: 'hvg_score'
layers: 'counts', 'normalized'
uns: 'dataset_id', 'normalization_id'

</div>

Data structure:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `var["hvg_score"]` | `double` | High variability gene score (normalized dispersion). The greater, the more variable. |
| `layers["counts"]` | `integer` | Raw counts. |
| `layers["normalized"]` | `double` | Normalized expression values. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["normalization_id"]` | `string` | Which normalization was used. |

</div>

## File format: Test data

The data for evaluating a dimensionality reduction.

Example file:
`resources_test/dimensionality_reduction/pancreas/solution.h5ad`

Format:

<div class="small">

AnnData object
obs: 'cell_type'
var: 'hvg_score'
layers: 'counts', 'normalized'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'

</div>

Data structure:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `obs["cell_type"]` | `string` | Classification of the cell type based on its characteristics and function within the tissue or organism. |
| `var["hvg_score"]` | `double` | High variability gene score (normalized dispersion). The greater, the more variable. |
| `layers["counts"]` | `integer` | Raw counts. |
| `layers["normalized"]` | `double` | Normalized expression values. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["dataset_name"]` | `string` | Nicely formatted name. |
| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
| `uns["dataset_description"]` | `string` | Long description of the dataset. |
| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
| `uns["normalization_id"]` | `string` | Which normalization was used. |

</div>

## Component type: Control method

Quality control methods for verifying the pipeline.
Expand Down Expand Up @@ -180,10 +270,56 @@ A dataset with dimensionality reduction embedding.
Example file:
`resources_test/dimensionality_reduction/pancreas/embedding.h5ad`

Format:

<div class="small">

AnnData object
obsm: 'X_emb'
uns: 'dataset_id', 'method_id', 'normalization_id'

</div>

Data structure:

<div class="small">

| Slot | Type | Description |
|:--------------------------|:---------|:-------------------------------------|
| `obsm["X_emb"]` | `double` | The dimensionally reduced embedding. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["method_id"]` | `string` | A unique identifier for the method. |
| `uns["normalization_id"]` | `string` | Which normalization was used. |

</div>

## File format: Score

Metric score file

Example file:
`resources_test/dimensionality_reduction/pancreas/score.h5ad`

Format:

<div class="small">

AnnData object
uns: 'dataset_id', 'normalization_id', 'method_id', 'metric_ids', 'metric_values'

</div>

Data structure:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["normalization_id"]` | `string` | Which normalization was used. |
| `uns["method_id"]` | `string` | A unique identifier for the method. |
| `uns["metric_ids"]` | `string` | One or more unique metric identifiers. |
| `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |

</div>

27 changes: 14 additions & 13 deletions _viash.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
viash_version: 0.9.0-RC7
viash_version: 0.9.0

# Step 1: Change the name of the task.
# example: task_name_of_this_task
Expand Down Expand Up @@ -34,7 +34,7 @@ description: |
of continuous trajectories. Despite almost every single-cell study using one of these visualisations
there has been debate as to whether they can effectively capture the variation in single-cell
datasets [@chari2023speciousart].
The dimensionality reduction task attempts to quantify the ability of methods to embed the
information present in complex single-cell studies into a two-dimensional space. Thus, this task
is specifically designed for dimensionality reduction for visualisation and does not consider other
Expand All @@ -61,7 +61,7 @@ description: |
# publisher = {Research Square},
# year = {2021},
# }

info:
image: thumbnail.svg
# Step 5: Replace the task_template to the name of the task.
Expand All @@ -74,41 +74,42 @@ info:
dest: resources_test/dimensionality_reduction

# Step 6: Update the authors of the task.
authors:
authors:
- name: Luke Zappia
roles: [ maintainer, author ]
roles: [maintainer, author]
info:
github: lazappi
orcid: 0000-0001-7744-8565
- name: Michal Klein
roles: [ author ]
roles: [author]
info:
github: michalk8
- name: Scott Gigante
roles: [ author ]
roles: [author]
info:
github: scottgigante
orcid: "0000-0002-4544-2764"
- name: Ben DeMeo
roles: [ author ]
roles: [author]
info:
github: bendemeo
- name: Robrecht Cannoodt
roles: [ author ]
roles: [author]
info:
github: rcannood
orcid: 0000-0003-3641-729X
- name: Kai Waldrant
roles: [ contributor ]
roles: [contributor]
info:
github: KaiWaldrant
orcid: 0009-0003-8555-1361
- name: Sai Nirmayi Yasa
roles: [ contributor ]
roles: [contributor]
info:
github: sainirmayi
orcid: 0009-0003-6319-9803
- name: Juan A. Cordero Varela
roles: [ contributor ]
roles: [contributor]
info:
github: jacorvar
orcid: 0000-0002-7373-5433
Expand All @@ -123,4 +124,4 @@ repositories:
- name: openproblems-v2
type: github
repo: openproblems-bio/openproblems-v2
tag: main_build
tag: main_build
2 changes: 1 addition & 1 deletion common
4 changes: 2 additions & 2 deletions src/api/comp_process_dataset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ arguments:
direction: output
required: true
test_resources:
- path: /resources_test/dimensionality_reduction/pancreas/
dest: resources_test/dimensionality_reduction/pancreas/
- path: /resources_test/common/pancreas/
dest: resources_test/common/pancreas/
- type: python_script
path: /common/component_tests/run_and_check_output.py
6 changes: 3 additions & 3 deletions src/api/file_common_dataset.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
type: file
example: "resources_test/dimensionality_reduction/pancreas/dataset.h5ad"
example: "resources_test/common/pancreas/dataset.h5ad"
label: "Dataset"
summary: "The dataset to pass to a method."
info:
format:
type: h5ad
layers:
layers:
- type: integer
name: counts
description: Raw counts
Expand All @@ -14,7 +14,7 @@ info:
name: normalized
description: Normalized expression values
required: true
obs:
obs:
- type: string
name: cell_type
description: Classification of the cell type based on its characteristics and function within the tissue or organism.
Expand Down
2 changes: 1 addition & 1 deletion src/api/file_dataset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ summary: "The dataset to pass to a method."
info:
format:
type: h5ad
layers:
layers:
- type: integer
name: counts
description: Raw counts
Expand Down
1 change: 0 additions & 1 deletion src/api/file_embedding.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,3 @@ info:
name: normalization_id
description: "Which normalization was used"
required: true

4 changes: 2 additions & 2 deletions src/api/file_score.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ info:
name: metric_ids
description: "One or more unique metric identifiers"
multiple: true
required: true
required: true
- type: double
name: metric_values
description: "The metric values obtained for the given prediction. Must be of same length as 'metric_ids'."
multiple: true
required: true
required: true
4 changes: 2 additions & 2 deletions src/api/file_solution.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ summary: "The data for evaluating a dimensionality reduction."
info:
format:
type: h5ad
layers:
layers:
- type: integer
name: counts
description: Raw counts
Expand All @@ -14,7 +14,7 @@ info:
name: normalized
description: Normalized expression values
required: true
obs:
obs:
- type: string
name: cell_type
description: Classification of the cell type based on its characteristics and function within the tissue or organism.
Expand Down
Loading

0 comments on commit ab268cc

Please sign in to comment.