Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow-up on Bulk RNA-Seq Normalization Methods Using Seurat #9578

Open
fmbetul opened this issue Dec 26, 2024 · 0 comments
Open

Follow-up on Bulk RNA-Seq Normalization Methods Using Seurat #9578

fmbetul opened this issue Dec 26, 2024 · 0 comments

Comments

@fmbetul
Copy link

fmbetul commented Dec 26, 2024

Hi Seurat Team,

I recently came across the closed issue #826 and have been using Seurat for bulk RNA-seq analysis as well. The discussion in the issue was very helpful, and I wanted to ask a follow-up question.

Initially, I followed the approach suggested in the thread:

combined_seurat <- CreateSeuratObject(counts = combined_tpm_data, project = "combined_TPM", meta.data = combined_tpm_metadata)  
combined_seurat@assays$RNA$data <- log(combined_seurat@assays$RNA$counts + 1)  
combined_seurat <- FindVariableFeatures(combined_seurat)  
combined_seurat <- ScaleData(combined_seurat)  
combined_seurat <- RunPCA(combined_seurat)  

The PCA plot from this approach reflected clustering patterns aligned with our biological expectations.

I then tested using the NormalizeData function, commonly used for single-cell RNA-seq data:

combined_seurat_v2 <- CreateSeuratObject(counts = combined_tpm_data, project = "combined_TPM", meta.data = combined_tpm_metadata)  
combined_seurat_v2 <- NormalizeData(combined_seurat_v2)  
combined_seurat_v2 <- FindVariableFeatures(combined_seurat_v2)  
combined_seurat_v2 <- ScaleData(combined_seurat_v2)  
combined_seurat_v2 <- RunPCA(combined_seurat_v2)  

Interestingly, the PCA plot generated from this approach appears to better separate our sample groups.

I am unsure if using the NormalizeData method, which is intended for single-cell analysis, is appropriate for bulk RNA-seq data. Would this be considered a legitimate approach, or are there concerns I should be aware of?

Below is my sessionInfo() for reference:

R version 4.3.2 (2023-10-31)  
Platform: aarch64-apple-darwin20 (64-bit)  
Running under: macOS Sonoma 14.7  

attached base packages:  
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:  
[1] Seurat_5.0.1       SeuratObject_5.0.1  

loaded via a namespace (and not attached):  
[1] Matrix_1.6-5       ggplot2_3.5.1      dplyr_1.1.4        data.table_1.14.10  
[5] Rcpp_1.0.12        future_1.33.1      lattice_0.21-9     patchwork_1.2.0  

I’d greatly appreciate your insights. Thank you for your time!

Best regards,
Betul

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant