-
Notifications
You must be signed in to change notification settings - Fork 926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created tutorial for MultiGSEA #5567
base: main
Are you sure you want to change the base?
Conversation
I am unsure if the tutorial should be part of the |
We are fine to move it to any other category, but none seems to fit yet. Me may create a |
@bernt-matthias the transcriptomics topic has a "multi-omics" subsection, could we add it there for now? |
@bernt-matthias we can then make "synthetic topic" by adding a "multi-omics" tag to all tutorials analyzing multi-omics data, and define the topic similar to the plants topic: https://github.com/galaxyproject/training-material/blob/main/metadata/plants.yaml Then the topic will be shown on the main page, since people interested in this tutorial may not naturally go to the proteomics topic, but tutorials themselves can live in multiple topics (as they do now) What do you think? |
This sounds good to me. @tStehling could you move it? @shiltemann do you have some links for @tStehling on how to assign tags / assign the tutorial to the sub-topic? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your contribution @tStehling! I've left some comments below, but please let me know if anything is unclear, or if you would like some help doing it :)
absolutely, @tStehling : In the proteomics topics, the subsection id is
And then simply add a tag of the same name. I will use this to create the "multiomics" synthetic topic later. To add a tag, add the following to the metadata of your tutorial
(and feel free to add more tags in this list as you see fit) tags are shown as follows under the tutorial name , and can help users identify interesting tutorials |
@shiltemann thank you for the effort. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good @tStehling! Just a few mini comments, otherwise good to go from my side.
metadata/multi-omics.yaml
Outdated
|
||
tag_based: true | ||
|
||
gitter: galaxy-multi-omics:matrix.org |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different from the one above, and I think both do not exist ...
I'm also not sure if we should really create a separate room for it or if we should reuse one of the other rooms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tStehling! I pushed some formatting changes and left a couple small comments below
> 3. You can also choose the Gene ID format for every data set. In this tutorial we will use the preset "SYMBOL" for transcriptomics and proteomics. For metabolomics we use HMDB. | ||
> 4. Select in **Supported organisms** the organism of which the data is about. In our case we select `Homo sapiens (Human)`. | ||
> 5. **Pathway databases**: Databases often contain their own format in which pathway definitions are provided. So you can select a relevant database. For the tutorial we choose `KEGG` | ||
> 6. **Combine p-values method**: Choose a method (here `Stouffer` for balanced weighting). To more comprehensively measure a pathway response, multiGSEA provides different approaches to compute an aggregated p value over multiple omics layers. Because no single approach for aggregating p values performs best under all circumstances, Loughin proposed basic recommendations on which method to use depending on structure and expectation of the problem. If small p values should be emphasized, Fisher’s method should be chosen. In cases where p values should be treated equally, Stouffer’s method is preferable. If large p values should be emphasized, the user should select Edgington’s method. Figure 2 indicates the difference between those three methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reformatted the hands-on boxes a bit. The hands-on boxes should be very concise, just telling the user how to configure the tool. I have moved all your (very useful!) explanations to a tip box inside the hands-on box, but you could also just put them in normal text before or after the hands-on box, as you prefer
> 8. Click on `Run Tool` | ||
> | ||
{: .hands_on} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be useful to discuss the output of the tool here? again both on technical level (what is the format, what do the contents mean?) and biological (what can we learn from the output)
Thanks for the comments. I will discuss with @tStehling tomorrow. |
|
||
# Preparing the Data | ||
|
||
To perform pathway enrichment with MultiGSEA, you'll need omics datasets in the file type TSV . Each individual data set contains four columns representing the feature (denoted as Symbol), the log2 fold change (logFC), the p-value (pValue), and the adjusted p-values (adj.pValue). We'll use example data provided on Zenodo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe Sebastian can tell us a few xrefs which methods can give the needed values for Transcriptomics, Metabolomics, and Proteomics.
> | ||
> > <tip-title>About the parameters</tip-title> | ||
> > - **Pathway databases**: `KEGG`Databases often contain their own format in which pathway definitions are provided. So you can select a relevant > > database. For the tutorial we choose `KEGG` | ||
> > - **Combine p-values method**: Choose a method (here `Stouffer` for balanced weighting). To more comprehensively measure a pathway response, multiGSEA provides different approaches to compute an aggregated p value over multiple omics layers. Because no single approach for aggregating p values performs best under all circumstances, Loughin proposed basic recommendations on which method to use depending on structure and expectation of the problem. If small p values should be emphasized, Fisher’s method should be chosen. In cases where p values should be treated equally, Stouffer’s method is preferable. If large p values should be emphasized, the user should select Edgington’s method. Figure 2 indicates the difference between those three methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a reference for Loughin?
Added a tutorial for MultiGSEA tool.