You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for providing such an amazing suite of tools! I have a question regarding SCTransform and its use with regression.
I am working with a dataset of ~100K cells spanning a few different conditions, including technical replicates. The dataset consists of a single cell type (a cell line) derived from a fast-growing, relatively homogeneous tumor.
I’ve observed that merging samples and then running SCTransform results in minimal batch effects, whereas running SCTransform separately for each sample introduces more noise. However, one technical replicate still exhibits a distinct batch cluster that becomes apparent when using more than 10 principal components (PCs) for clustering. This particular replicate has significantly higher UMI and gene counts per cell.
I’ve tried several approaches, including down-sampling at both the matrix and molecule levels, which helps marginally. However, the batch cluster persists when using PCs >10. I’ve also attempted integration methods (Scanorama, Harmony etc), but these tend to over-correct.
When using SCTransform as shown below, the problematic batch cluster is corrected, and the replicates and conditions align well with expectations:
My understanding is that running SCTransform on the merged dataset leads to a consistent per-gene residual calculation across all cells, which in my case is giving good results for comparisons across samples.
I am aware that SCTransform inherently corrects for nCount_RNA differences. However, explicitly specifying both nFeature_RNA and nCount_RNA in the vars.to.regress option yields robust results for us (using just one does not), with the complete dissolution of a problematic batch cluster. I imagine this combined adjustment applies a more liberal correction, which seems to suit our data well.
As I understand it, this approach would only obscure biological signal if global RNA abundance were an interest, which is not the case for these data. If anyone has any thoughts on this approach I'd be really keen to hear them.
The text was updated successfully, but these errors were encountered:
Dear developers,
Thank you for providing such an amazing suite of tools! I have a question regarding SCTransform and its use with regression.
I am working with a dataset of ~100K cells spanning a few different conditions, including technical replicates. The dataset consists of a single cell type (a cell line) derived from a fast-growing, relatively homogeneous tumor.
I’ve observed that merging samples and then running SCTransform results in minimal batch effects, whereas running SCTransform separately for each sample introduces more noise. However, one technical replicate still exhibits a distinct batch cluster that becomes apparent when using more than 10 principal components (PCs) for clustering. This particular replicate has significantly higher UMI and gene counts per cell.
I’ve tried several approaches, including down-sampling at both the matrix and molecule levels, which helps marginally. However, the batch cluster persists when using PCs >10. I’ve also attempted integration methods (Scanorama, Harmony etc), but these tend to over-correct.
When using SCTransform as shown below, the problematic batch cluster is corrected, and the replicates and conditions align well with expectations:
RBL_merg <- SCTransform(
RBL_merg,
vars.to.regress = c("S.Score", "G2M.Score", "percent.mt", "nFeature_RNA", "nCount_RNA"),
conserve.memory = TRUE,
verbose = TRUE)
My understanding is that running SCTransform on the merged dataset leads to a consistent per-gene residual calculation across all cells, which in my case is giving good results for comparisons across samples.
I am aware that SCTransform inherently corrects for nCount_RNA differences. However, explicitly specifying both nFeature_RNA and nCount_RNA in the vars.to.regress option yields robust results for us (using just one does not), with the complete dissolution of a problematic batch cluster. I imagine this combined adjustment applies a more liberal correction, which seems to suit our data well.
As I understand it, this approach would only obscure biological signal if global RNA abundance were an interest, which is not the case for these data. If anyone has any thoughts on this approach I'd be really keen to hear them.
The text was updated successfully, but these errors were encountered: