-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute differential expression (tumor versus normal in paired samples) for cancers #31
base: master
Are you sure you want to change the base?
Conversation
Output looks like this:
|
Tagging @ksimeono who had interest in the colon adenocarcinoma (COAD) data.
Based on suggestions from @ksimeonov
entrez_gene_id was a float due to odd behavior by df.merge in pandas. This resulting in `float_format='%.4g'` of to_csv causing exponent formatting of entrez_gene_id and irreversibly corrupting their IDs.
Rerun with gene data created by cognoma#32. Should result in all genes having a symbol.
@gwaygenomics what do you think of the plot in The heatmap shows differential expression signatures for each cancer. Genes were transformed to 100 genes using NMF. Fill color represents the t-statistic. |
There is a lot going on in it! I am going to outline what it is and try to extract biology along the way.
I think a rough description of what is going on with the genes in each component would spark more biological discussion. Another thing to keep in mind is that the "normals" are actually "tumor adjacent" and are opportunistically extracted from "nearby" tissue when the surgeon can (therefore, no GBM tumor adjacent). I think its important to not consider this "normal" (Troester et al. 2016) (to be clear, the terminology is ok, but I mean thinking about this as normal tissue could be a trap!) |
A conceptual summary comparing the approach with Gross et al. would be good somewhere in the notebook - particularly if you link to that paper. |
Agree with @gwaygenomics that some sort of biologically meaningful notation of the metagenes would be beneficial. Out of my element in terms of what's possible, but grouping genes by pathway initially instead of metagene could be something similar but with inherent meaning. Along the same lines, expanded names for the cancers, rather than just TCGA acronyms would improve readability. |
@gwaygenomics we're using a paired t-test, about which the following has been said:
Thanks @cgreene, @ksimeono, & @gwaygenomics for the comments. Will be at least a week before I get around to addressing them. |
Ah yes, good point! |
Note to untrack
data/complete/differential-expression.tsv.bz2
before merging.This is something that @ksimeono -- a cancer biologist -- was interested it. It's potentially out of scope for Cognoma, but I thought it's pretty useful.
No rush to merge, just wanted to get this up here.