Skip to content

Commit

Permalink
baseline models of cell type specific, metacell, and pearson added
Browse files Browse the repository at this point in the history
  • Loading branch information
janursa committed Sep 12, 2024
1 parent 1790e5c commit 5db75e1
Show file tree
Hide file tree
Showing 46 changed files with 808 additions and 4,738 deletions.
31 changes: 31 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# mac os x
.DS_Store

# related to files
params*
resources*
output/
target/
out/
local/
datasets_raw/
state*
trace*
tw-*

# related to python
.ipynb_checkpoints
__pycache__/

# ?
.$*
bin*
bin/

# related to nextflow
work
.nextflow*

# IDE related
.idea
.vscode
171 changes: 84 additions & 87 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,7 @@ Do not edit this file directly.
-->

Benchmarking GRN inference methods The full documentation is hosted on
[ReadTheDocs](https://openproblems-grn-task.readthedocs.io/en/latest/index.html).
[![Documentation
Status](https://readthedocs.org/projects/grn-inference-benchmarking/badge/?version=latest.png)](https://grn-inference-benchmarking.readthedocs.io/en/latest/?badge=latest)
[ReadTheDocs](https://grn-inference-benchmarking.readthedocs.io/en/latest/index.html).

Path to source:
[`src`](https://github.com/openproblems-bio/task_grn_inference/tree/main/src)
Expand Down Expand Up @@ -115,39 +113,37 @@ approaches to assess both accuracy and comprehensiveness.

``` mermaid
flowchart LR
file_perturbation_h5ad("perturbation")
comp_control_method[/"Control Method"/]
comp_metric[/"Label"/]
file_prediction("GRN")
file_score("Score")
file_multiomics_rna_h5ad("multiomics rna")
comp_method[/"Method"/]
file_prediction("GRN")
comp_metric[/"Label"/]
file_score("Score")
file_multiomics_atac_h5ad("multiomics atac")
file_perturbation_h5ad("perturbation")
comp_control_method[/"Control Method"/]
comp_method_r[/"Method r"/]
file_perturbation_h5ad---comp_control_method
file_perturbation_h5ad---comp_metric
comp_control_method-->file_prediction
comp_metric-->file_score
file_prediction---comp_metric
file_multiomics_rna_h5ad---comp_method
comp_method-->file_prediction
file_prediction---comp_metric
comp_metric-->file_score
file_multiomics_atac_h5ad---comp_method
file_perturbation_h5ad---comp_metric
comp_control_method-->file_prediction
comp_method_r-->file_prediction
```

## File format: perturbation
## File format: multiomics rna

Perturbation dataset for benchmarking.
RNA expression for multiomics data.

Example file: `resources_test/grn-benchmark/perturbation_data.h5ad`
Example file: `resources_test/grn-benchmark/multiomics_rna.h5ad`

Format:

<div class="small">

AnnData object
obs: 'cell_type', 'sm_name', 'donor_id', 'plate_name', 'row', 'well', 'cell_count'
layers: 'n_counts', 'pearson', 'lognorm'
obs: 'cell_type', 'donor_id'

</div>

Expand All @@ -158,60 +154,30 @@ Slot description:
| Slot | Type | Description |
|:---|:---|:---|
| `obs["cell_type"]` | `string` | The annotated cell type of each cell based on RNA expression. |
| `obs["sm_name"]` | `string` | The primary name for the (parent) compound (in a standardized representation) as chosen by LINCS. This is provided to map the data in this experiment to the LINCS Connectivity Map data. |
| `obs["donor_id"]` | `string` | Donor id. |
| `obs["plate_name"]` | `string` | Plate name 6 levels. |
| `obs["row"]` | `string` | Row name on the plate. |
| `obs["well"]` | `string` | Well name on the plate. |
| `obs["cell_count"]` | `string` | Number of single cells pseudobulked. |
| `layers["n_counts"]` | `double` | Pseudobulked values using mean approach. |
| `layers["pearson"]` | `double` | (*Optional*) Normalized values using pearson residuals. |
| `layers["lognorm"]` | `double` | (*Optional*) Normalized values using shifted logarithm . |

</div>

## Component type: Control Method

Path:
[`src/control_methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/control_methods)

A control method.

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--perturbation_data` | `file` | Perturbation dataset for benchmarking. |
| `--layer` | `string` | (*Optional*) Which layer of pertubation data to use to find tf-gene relationships. Default: `scgen_pearson`. |
| `--prediction` | `file` | (*Output*) GRN prediction. |
| `--tf_all` | `file` | (*Optional*) NA. |

</div>

## Component type: Label
## Component type: Method

Path:
[`src/metrics`](https://github.com/openproblems-bio/openproblems/tree/main/src/metrics)
[`src/methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/methods)

A metric to evaluate the performance of the inferred GRN
A GRN inference method

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--perturbation_data` | `file` | (*Optional*) Perturbation dataset for benchmarking. |
| `--prediction` | `file` | GRN prediction. |
| `--score` | `file` | (*Optional, Output*) File indicating the score of a metric. |
| `--reg_type` | `string` | (*Optional*) name of regretion to use. Default: `ridge`. |
| `--subsample` | `integer` | (*Optional*) number of samples randomly drawn from perturbation data. Default: `-2`. |
| `--max_workers` | `integer` | (*Optional*) NA. Default: `4`. |
| `--method_id` | `string` | (*Optional*) NA. |
| `--tf_all` | `file` | (*Optional*) NA. |
| `--apply_tf` | `boolean` | (*Optional*) NA. Default: `TRUE`. |
| `--multiomics_rna` | `file` | (*Optional*) RNA expression for multiomics data. Default: `resources/grn-benchmark/multiomics_rna.h5ad`. |
| `--multiomics_atac` | `file` | (*Optional*) Peak data for multiomics data. Default: `resources/grn-benchmark/multiomics_atac.h5ad`. |
| `--prediction` | `file` | (*Optional, Output*) GRN prediction. Default: `output/prediction.csv`. |
| `--temp_dir` | `string` | (*Optional*) NA. Default: `output/temdir`. |
| `--num_workers` | `integer` | (*Optional*) NA. Default: `4`. |
| `--tf_all` | `file` | (*Optional*) NA. Default: `resources/prior/tf_all.csv`. |
| `--max_n_links` | `integer` | (*Optional*) NA. Default: `50000`. |

</div>

Expand Down Expand Up @@ -242,6 +208,32 @@ Slot description:

</div>

## Component type: Label

Path:
[`src/metrics`](https://github.com/openproblems-bio/openproblems/tree/main/src/metrics)

A metric to evaluate the performance of the inferred GRN

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--perturbation_data` | `file` | (*Optional*) Perturbation dataset for benchmarking. Default: `resources/grn-benchmark/perturbation_data.h5ad`. |
| `--prediction` | `file` | GRN prediction. |
| `--score` | `file` | (*Optional, Output*) File indicating the score of a metric. Default: `output/score.h5ad`. |
| `--reg_type` | `string` | (*Optional*) name of regretion to use. Default: `ridge`. |
| `--subsample` | `integer` | (*Optional*) number of samples randomly drawn from perturbation data. Default: `-2`. |
| `--max_workers` | `integer` | (*Optional*) NA. Default: `4`. |
| `--method_id` | `string` | (*Optional*) NA. |
| `--tf_all` | `file` | (*Optional*) NA. Default: `resources/prior/tf_all.csv`. |
| `--apply_tf` | `boolean` | (*Optional*) NA. Default: `TRUE`. |
| `--clip_scores` | `boolean` | (*Optional*) clips the r2 scores for each gene to make them within \[0, 1\]. Default: `TRUE`. |

</div>

## File format: Score

File indicating the score of a metric.
Expand Down Expand Up @@ -270,11 +262,11 @@ Slot description:

</div>

## File format: multiomics rna
## File format: multiomics atac

RNA expression for multiomics data.
Peak data for multiomics data.

Example file: `resources_test/grn-benchmark/multiomics_rna.h5ad`
Example file: `resources_test/grn-benchmark/multiomics_atac.h5ad`

Format:

Expand All @@ -296,52 +288,57 @@ Slot description:

</div>

## Component type: Method
## File format: perturbation

Path:
[`src/methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/methods)
Perturbation dataset for benchmarking.

A GRN inference method
Example file: `resources_test/grn-benchmark/perturbation_data.h5ad`

Arguments:
Format:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--multiomics_rna` | `file` | (*Optional*) RNA expression for multiomics data. |
| `--multiomics_atac` | `file` | (*Optional*) Peak data for multiomics data. |
| `--prediction` | `file` | (*Optional, Output*) GRN prediction. |
| `--temp_dir` | `string` | (*Optional*) NA. Default: `output/temdir`. |
| `--num_workers` | `integer` | (*Optional*) NA. Default: `4`. |
| `--tf_all` | `file` | (*Optional*) NA. |
| `--max_n_links` | `integer` | (*Optional*) NA. Default: `50000`. |
AnnData object
obs: 'cell_type', 'sm_name', 'donor_id', 'plate_name', 'row', 'well', 'cell_count'
layers: 'n_counts', 'pearson', 'lognorm'

</div>

## File format: multiomics atac
Slot description:

Peak data for multiomics data.
<div class="small">

Example file: `resources_test/grn-benchmark/multiomics_atac.h5ad`
| Slot | Type | Description |
|:---|:---|:---|
| `obs["cell_type"]` | `string` | The annotated cell type of each cell based on RNA expression. |
| `obs["sm_name"]` | `string` | The primary name for the (parent) compound (in a standardized representation) as chosen by LINCS. This is provided to map the data in this experiment to the LINCS Connectivity Map data. |
| `obs["donor_id"]` | `string` | Donor id. |
| `obs["plate_name"]` | `string` | Plate name 6 levels. |
| `obs["row"]` | `string` | Row name on the plate. |
| `obs["well"]` | `string` | Well name on the plate. |
| `obs["cell_count"]` | `string` | Number of single cells pseudobulked. |
| `layers["n_counts"]` | `double` | Pseudobulked values using mean approach. |
| `layers["pearson"]` | `double` | (*Optional*) Normalized values using pearson residuals. |
| `layers["lognorm"]` | `double` | (*Optional*) Normalized values using shifted logarithm . |

Format:
</div>

<div class="small">
## Component type: Control Method

AnnData object
obs: 'cell_type', 'donor_id'
Path:
[`src/control_methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/control_methods)

</div>
A control method.

Slot description:
Arguments:

<div class="small">

| Slot | Type | Description |
| Name | Type | Description |
|:---|:---|:---|
| `obs["cell_type"]` | `string` | The annotated cell type of each cell based on RNA expression. |
| `obs["donor_id"]` | `string` | Donor id. |
| `--layer` | `string` | (*Optional*) Which layer of pertubation data to use to find tf-gene relationships. Default: `scgen_pearson`. |
| `--prediction` | `file` | (*Optional, Output*) GRN prediction. |
| `--tf_all` | `file` | NA. |

</div>

Expand Down
8 changes: 0 additions & 8 deletions params/celloracle.yaml

This file was deleted.

Loading

0 comments on commit 5db75e1

Please sign in to comment.