Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarifications on CytoNorm behaviour for various datasets #48

Open
rprops opened this issue Jan 6, 2025 · 4 comments
Open

Clarifications on CytoNorm behaviour for various datasets #48

rprops opened this issue Jan 6, 2025 · 4 comments

Comments

@rprops
Copy link

rprops commented Jan 6, 2025

Hi -

I've been playing around with the package on various datasets (QC beads / biological data) and I've encountered some things that require expert guidance.

Artefacts introduced through normalization, their behaviour and how to manage them.

Below you can see the normalization of QC beads that were measured on different instruments (but same model) under identical PMTV settings. Input fcs data (fcs_merge is normalized (µ = 0, sd = 1)).

Regardless of SOM settings (or even skipping SOM), artefacts in bivariate scatter plots emerge that distort the original multivariate distribution. This was not picked-up in a univariate histogram inspection.

Is this known behaviour, and if so, how does one manage these distortions?

cytonorm_obj <- suppressMessages(
      CytoNorm.train(
        files = fcs_merge,
        labels = ref_data$batch_id,
        channels = param,
        FlowSOM.params = list(
          nCells = 1e6,
          xdim = 15,
          ydim = 15,
          nClus = 5,
          scale = FALSE
        ),
        normMethod.train = QuantileNorm.train,
        normParams = list(goal = "mean"),
        seed = 777,
        transformList = NULL,
        clean = TRUE,
        recompute = TRUE,
        verbose = FALSE
      )
    )

One sample from one instrument, normalized according to batch effects (red after normalization, black before normalization)

image

Two samples, one from each instrument, aligned. (red sample 1 normalized, black sample 2 normalized)

image

Goal-based normalization still normalizes goal batch data

How does is goal-based alignment implemented for batch_ids? My interpretation is that all other batches are aligned to the goal batch meaning that the goal batch is not normalized per use of CytoNorm.normalize? However, upon trying this out myself, the goal batch is still normalized. See figure below.

Can this be clarified?

An example of a biological sample here below: (red after normalization, black before normalization)

image

An example of a bead sample here below: (red after normalization, black before normalization)

image

Advice on SOM clustering (yes/no)

I find in the instructions that SOM clustering could be skipped in case of a low number of discrete populations. I've implemented this but I instead see a similar performance and artefacts. Can there be more precise guidance on when to use SOM and when not to use SOM. Have there been any benchmarks done in this regard ?

image

I can supply input data via email if required.

Thanks in advance for taking the time to address my remarks!

Ruben

@FMKerckhof
Copy link

On the both-batch normalization issue: our in-house python dev @prubbens, ran into a similar issue with CytoNormPy (TarikExner/CytoNormPy#11 (comment)) -> could this be an issue here as well, how should we parameterize Cytonorm.train to obtain the same?

@SamGG
Copy link

SamGG commented Jan 8, 2025

Hi. I am not part of the dev team.
Let's focus on the 1st figure. I do understand the two density plots. I don't understand the BL1 vs BL3 plot. Do BL1 and BL3 carry the same marker?
What I should do is a similar plot showing BL1 before vs after normalization (and the same plot for BL3). Then I would see/understand what the transformation applied to the data look like for BL1 (resp. BL3). Then I would relate this BL1 (resp. BL3) transformation to what a quantile normalization.
See you then.

@FMKerckhof
Copy link

Hi @SamGG - BL1 and BL3 are just green and red fluoresence on the 488 nm laser (530/30 and 659/40) - the beads that were used where Thermo Attune NXT performance tracking beads which have 4 intensities across each detector and laser. With the PMTV settings and thresholds that we used to acquire them, we can roughly discrimate two populations in the detectors that were specified.

We will have a look to generate the BL1 pre/post bivariate plot (and similarily BL3) as you suggested and get back to you here .

@SamGG
Copy link

SamGG commented Jan 15, 2025

Hi @FMKerckhof . What's up? Send me an e-mail for sharing FCS files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants