Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could CytoNorm impede discovery of novel clusters? #10

Open
vivek-verma202 opened this issue Jun 8, 2020 · 8 comments
Open

Could CytoNorm impede discovery of novel clusters? #10

vivek-verma202 opened this issue Jun 8, 2020 · 8 comments

Comments

@vivek-verma202
Copy link

Hi!
If the training set is from healthy controls and the hypothesis is to discover novel clusters (using Diffcyt) that occur only in cases but not in controls, could CytoNorm pre-processing wash-off the signal?

Thanks,
Vivek

@tomashhurst
Copy link

@vivek-verma202 it doesn't have to, but it's a bit complicated. In our implementation in Spectre we generate clusters using only stable markers to get the major population groups (e.g. Ly6G, CD19, CD4, CD3 etc in mouse). Essentially when thinking about cluster-specific batch effects, this probably happens at the level of fundamental biological groups. E.g. most T cells would likely have a similar batch effects, whereas the batch effects on T cells might be very different to those on eosinophils. So we are only trying to cluster eosinophils, neutrophils, monocytes, T cells, NK cells, and B cells for alignment. If the alignment is done using only these large population groups, then when the actual analysis gets done afterwards, you can cluster on all markers and still find novel clusters.

Now the specific issue you raised is a good point -- if the healthy controls are being used as the reference samples then the alignment of markers might get messed up, as some disease samples might have high levels of some markers that aren't present in the healthy controls. One of the requirements of CytoNorm specified in the document is that the reference control needs to span the full range of the data. The implication is that, in an ideal world, you could use one of the 'disease' sample etc so that all the activation markers etc will be present. However, in practice this is difficult to do. The way we have been getting around this is, as above, use just stable markers to create clusters for alignment, and then we keep both the raw and aligned data in our dataset. Then in our analysis proper, we cluster on all the markers where we know the distribution is fairly similar between healthy and diseased (CD11b etc), and then look for novel patterns/bifurcations in each cluster that are generated by the raw data for activation/novel markers (CD80/CD86 etc). There are essentially two ways of using clustering: one is to cluster on everything and find new clusters 'appearing' in experimental groups, or cluster on stable markers and then ask how each of those stable clusters have changed between experimental groups -- the approach described above is the later.

@vivek-verma202
Copy link
Author

@tomashhurst , thank you for your response!
I have 2 follow-up questions:

  1. if I use only stable markers to cluster for alignment, would the intensities of the unused channels will also be corrected for batch effects? For instance, if I use CD56 and CD16 as stable markers to cluster for alignment, would subsequent results, say, expression of NKp46 on NK cells of patients and controls is different, would still be devoid of any batch effects?

  2. Would you recommend, univariate signal alignment (like fdaNorm) as a preprocessing step to de-noise the data prior to using CytoNorm?

@emmanuelaaaaa
Copy link

Hello,

I have been having similar questions, so thanks @tomashhurst for the input. However, as far as I understand, having only used some markers for the training (the "stable" ones), then only those markers will be normalised and the rest will not even appear on the normalised fcs samples. So in your analyses, do you just append the rest of the markers (the non normalised ones) on the normalised fcs files?
Also if you are suggesting to only use some markers for the normalisation, does that mean that you tried using all of them but the normalisation didn't work as well?

Thanks again for your time.
Best,
Emma

@SofieVG
Copy link
Member

SofieVG commented Jul 14, 2020 via email

@emmanuelaaaaa
Copy link

Hi Sofie,

Thank you so much for the quick reply. That's very helpful! Should we also use a different transformList for each step as well then (with the "stable" channels for the prepareFlowSOM and all the channels for CytoNorm.train)?

Best,
Emma

@SofieVG
Copy link
Member

SofieVG commented Jul 14, 2020 via email

@emmanuelaaaaa
Copy link

Thanks!
I know I'm going a bit off topic here, but if you could please elaborate on what the transformList is actually doing as well I would be grateful. It's not completely clear to me from the documentation, is it an arcsinh transformation before you do the normalisation and then you return the values to their original (non transformed) range before you export to fcs?

@SofieVG
Copy link
Member

SofieVG commented Jul 14, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants