diff --git a/README.md b/README.md index da3ffe5..42db01a 100644 --- a/README.md +++ b/README.md @@ -1,37 +1,346 @@ -# Task Template +# Template -This repo is a template to create a new task for the OpenProblems v2. This repo contains several example files and components that can be used when updated with the task info. -> [!WARNING] -> This README will be overwritten when performing the `create_task_readme` script. + -## Create a repository from this template +A one sentence summary of purpose and methodology. Used for creating an +overview tables. -> [!IMPORTANT] -> Before creating a new repository, make sure you are part of the OpenProblems task team. This will be done when you create an issue for the task and you get the go ahead to create the task. -> For more information on how to create a new task, check out the [Create a new task](https://openproblems.bio/documentation/create_task/) documentation. +Repository: [rcannood/test](https://github.com/rcannood/test) -The instructions below will guide you through creating a new repository from this template ([creating-a-repository-from-a-template](https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-repository-from-a-template#creating-a-repository-from-a-template)). +## Description +Provide a clear and concise description of your task, detailing the +specific problem it aims to solve. Outline the input data types, the +expected output, and any assumptions or constraints. Be sure to explain +any terminology or concepts that are essential for understanding the +task. -* Click the "Use this template" button on the top right of the repository. -* Use the Owner dropdown menu to select the `openproblems-bio` account. -* Type a name for your repository (task_...), and a description. -* Set the repository visibility to public. -* Click "Create repository from template". +Explain the motivation behind your proposed task. Describe the +biological or computational problem you aim to address and why it’s +important. Discuss the current state of research in this area and any +gaps or challenges that your task could help address. This section +should convince readers of the significance and relevance of your task. -## Clone the repository +## Authors & contributors -To clone the repository with the submodule files, you can use the following command: +| name | roles | +|:---------|:-------------------| +| John Doe | author, maintainer | -```bash -git clone --recursive git@github.com:openproblems-bio/.git +## API + +``` mermaid +flowchart LR + file_common_dataset("Common Dataset") + comp_data_processor[/"Data processor"/] + file_solution("Solution") + file_test_h5ad("Test data") + file_train_h5ad("Training data") + comp_control_method[/"Control Method"/] + comp_metric[/"Metric"/] + comp_method[/"Method"/] + file_prediction("Predicted data") + file_score("Score") + file_common_dataset---comp_data_processor + comp_data_processor-->file_solution + comp_data_processor-->file_test_h5ad + comp_data_processor-->file_train_h5ad + file_solution---comp_control_method + file_solution---comp_metric + file_test_h5ad---comp_control_method + file_test_h5ad---comp_method + file_train_h5ad---comp_control_method + file_train_h5ad---comp_method + comp_control_method-->file_prediction + comp_metric-->file_score + comp_method-->file_prediction + file_prediction---comp_metric ``` ->[!NOTE] -> If somehow there are no files visible in the submodule after cloning using the above command. Check the instructions [here](common/README.md). -## What to do next +## File format: Common Dataset + +A subset of the common dataset. + +Example file: `resources_test/common/pancreas/dataset.h5ad` + +Format: + +
+ + AnnData object + obs: 'cell_type', 'batch' + var: 'hvg', 'hvg_score' + obsm: 'X_pca' + layers: 'counts', 'normalized' + uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id' + +
+ +Data structure: + +
+ +| Slot | Type | Description | +|:---|:---|:---| +| `obs["cell_type"]` | `string` | Cell type information. | +| `obs["batch"]` | `string` | Batch information. | +| `var["hvg"]` | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. | +| `var["hvg_score"]` | `double` | A ranking of the features by hvg. | +| `obsm["X_pca"]` | `double` | The resulting PCA embedding. | +| `layers["counts"]` | `integer` | Raw counts. | +| `layers["normalized"]` | `double` | Normalized expression values. | +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| `uns["dataset_name"]` | `string` | Nicely formatted name. | +| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. | +| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. | +| `uns["dataset_summary"]` | `string` | Short description of the dataset. | +| `uns["dataset_description"]` | `string` | Long description of the dataset. | +| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. | +| `uns["normalization_id"]` | `string` | Which normalization was used. | + +
+ +## Component type: Data processor + +A data processor. + +Arguments: + +
+ +| Name | Type | Description | +|:---|:---|:---| +| `--input` | `file` | A subset of the common dataset. | +| `--output_train` | `file` | (*Output*) The training data in h5ad format. | +| `--output_test` | `file` | (*Output*) The subset of molecules used for the test dataset. | +| `--output_solution` | `file` | (*Output*) The solution for the test data. | + +
+ +## File format: Solution + +The solution for the test data + +Example file: `resources_test/task_template/pancreas/solution.h5ad` + +Format: + +
+ + AnnData object + obs: 'label', 'batch' + var: 'hvg', 'hvg_score' + obsm: 'X_pca' + layers: 'counts', 'normalized' + uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id' + +
+ +Data structure: + +
+ +| Slot | Type | Description | +|:---|:---|:---| +| `obs["label"]` | `string` | Ground truth cell type labels. | +| `obs["batch"]` | `string` | Batch information. | +| `var["hvg"]` | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. | +| `var["hvg_score"]` | `double` | A ranking of the features by hvg. | +| `obsm["X_pca"]` | `double` | The resulting PCA embedding. | +| `layers["counts"]` | `integer` | Raw counts. | +| `layers["normalized"]` | `double` | Normalized counts. | +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| `uns["dataset_name"]` | `string` | Nicely formatted name. | +| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. | +| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. | +| `uns["dataset_summary"]` | `string` | Short description of the dataset. | +| `uns["dataset_description"]` | `string` | Long description of the dataset. | +| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. | +| `uns["normalization_id"]` | `string` | Which normalization was used. | + +
+ +## File format: Test data + +The subset of molecules used for the test dataset + +Example file: `resources_test/task_template/pancreas/test.h5ad` + +Format: + +
+ + AnnData object + obs: 'batch' + var: 'hvg', 'hvg_score' + obsm: 'X_pca' + layers: 'counts', 'normalized' + uns: 'dataset_id', 'normalization_id' + +
+ +Data structure: + +
+ +| Slot | Type | Description | +|:---|:---|:---| +| `obs["batch"]` | `string` | Batch information. | +| `var["hvg"]` | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. | +| `var["hvg_score"]` | `double` | A ranking of the features by hvg. | +| `obsm["X_pca"]` | `double` | The resulting PCA embedding. | +| `layers["counts"]` | `integer` | Raw counts. | +| `layers["normalized"]` | `double` | Normalized counts. | +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| `uns["normalization_id"]` | `string` | Which normalization was used. | + +
+ +## File format: Training data + +The training data in h5ad format + +Example file: `resources_test/task_template/pancreas/train.h5ad` + +Format: + +
+ + AnnData object + obs: 'label', 'batch' + var: 'hvg', 'hvg_score' + obsm: 'X_pca' + layers: 'counts', 'normalized' + uns: 'dataset_id', 'normalization_id' + +
+ +Data structure: + +
+ +| Slot | Type | Description | +|:---|:---|:---| +| `obs["label"]` | `string` | Ground truth cell type labels. | +| `obs["batch"]` | `string` | Batch information. | +| `var["hvg"]` | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. | +| `var["hvg_score"]` | `double` | A ranking of the features by hvg. | +| `obsm["X_pca"]` | `double` | The resulting PCA embedding. | +| `layers["counts"]` | `integer` | Raw counts. | +| `layers["normalized"]` | `double` | Normalized counts. | +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| `uns["normalization_id"]` | `string` | Which normalization was used. | + +
+ +## Component type: Control Method + +Quality control methods for verifying the pipeline. + +Arguments: + +
+ +| Name | Type | Description | +|:---|:---|:---| +| `--input_train` | `file` | The training data in h5ad format. | +| `--input_test` | `file` | The subset of molecules used for the test dataset. | +| `--input_solution` | `file` | The solution for the test data. | +| `--output` | `file` | (*Output*) A predicted dataset as output by a method. | + +
+ +## Component type: Metric + +A task template metric. + +Arguments: + +
+ +| Name | Type | Description | +|:---|:---|:---| +| `--input_solution` | `file` | The solution for the test data. | +| `--input_prediction` | `file` | A predicted dataset as output by a method. | +| `--output` | `file` | (*Output*) File indicating the score of a metric. | + +
+ +## Component type: Method + +A method. + +Arguments: + +
+ +| Name | Type | Description | +|:---|:---|:---| +| `--input_train` | `file` | The training data in h5ad format. | +| `--input_test` | `file` | The subset of molecules used for the test dataset. | +| `--output` | `file` | (*Output*) A predicted dataset as output by a method. | + +
+ +## File format: Predicted data + +A predicted dataset as output by a method. + +Example file: `resources_test/task_template/pancreas/prediction.h5ad` + +Format: + +
+ + AnnData object + obs: 'label_pred' + uns: 'dataset_id', 'normalization_id', 'method_id' + +
+ +Data structure: + +
+ +| Slot | Type | Description | +|:--------------------------|:---------|:-------------------------------------| +| `obs["label_pred"]` | `string` | Predicted labels for the test cells. | +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| `uns["normalization_id"]` | `string` | Which normalization was used. | +| `uns["method_id"]` | `string` | A unique identifier for the method. | + +
+ +## File format: Score + +File indicating the score of a metric. + +Example file: `resources/score.h5ad` + +Format: + +
+ + AnnData object + uns: 'dataset_id', 'normalization_id', 'method_id', 'metric_ids', 'metric_values' + +
+ +Data structure: + +
+ +| Slot | Type | Description | +|:---|:---|:---| +| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. | +| `uns["normalization_id"]` | `string` | Which normalization was used. | +| `uns["method_id"]` | `string` | A unique identifier for the method. | +| `uns["metric_ids"]` | `string` | One or more unique metric identifiers. | +| `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. | -Check out the [instructions](https://github.com/openproblems-bio/common_resources/blob/main/INSTRUCTIONS.md) for more information on how to update the example files and components. These instructions also contain information on how to build out the task and basic commands. +
-For more information on the OpenProblems v2, check out the [documentation](https://openproblems.bio/documentation/). \ No newline at end of file