You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working on integrating a large number of single-cell RNA sequencing (scRNA-seq) libraries—about 110 in total—with varying read coverage and cell counts. After performing basic filtering, we have approximately 630K cells.
I have set up a basic Seurat V5 pipeline, which includes reading each library as a Seurat object, saving them into a list, and performing the following steps on each library one at a time: FindVariableFeatures, normalization, and sketching. After that, I proceed to the integration vignette.
However, the "FindIntegrationAnchors" step is taking an unusually long time—over three days. I am unsure if this duration is expected or if there are ways to optimize the following code to make the process faster. Any help is much appreciated.
`
I have read the libraries one at a time into a seurat object and saved them as RDS. Hence instead of reading it from h5 I am reading hte individual RDS files
#Sample cells using sketch data function and perform downstream analysis
print("Step 5: normalize and identify variable features for each dataset independently")
filtered_seurat.list <- lapply(X = seurat_object_list, FUN = function(x) {
print(unique([email protected]$seq_folder))
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2500)
x <- SketchData(x, n = 5000,method = "LeverageScore", sketched.assay = "sketch")
x <- NormalizeData(x)
print("Step 6: Combine the sketched libraries into a list and find integration anchors")
features <- SelectIntegrationFeatures(object.list = filtered_seurat.list)
filtered_seurat.list <- lapply(X = filtered_seurat.list, FUN = function(x) {
x <- ScaleData(x, features = features, verbose = FALSE)
x <- RunPCA(x, features = features, verbose = FALSE)
})
print("Step 11:# Run the standard workflow ProjectData on the sketched")
Project_seurat_sketch_inte <- ProjectData(object = Project_seurat_sketch_inte,
sketched.assay = "sketch", assay = "RNA",
sketched.reduction = "integrated.rpca.full",
full.reduction = "integrated.rpca.full", dims = 1:30,
refdata = list(celltype.full = "celltype.manual"))
saveRDS(filtered_seurat.list,"./filtered_seurat.ProjecData_seurat_sketch_inte.RDS")
print("Step 11:# Run the standard workflow RunUMAP on the sketched")
Project_seurat_sketch_inte <- RunUMAP(Project_seurat_sketch_inte,
reduction = "integrated.rpca.full", dims = 1:30,
reduction.name = "umap.full",reduction.key = "UMAP_full_")
Discussed in #9606
Originally posted by sentisci January 7, 2025
Hi,
I am working on integrating a large number of single-cell RNA sequencing (scRNA-seq) libraries—about 110 in total—with varying read coverage and cell counts. After performing basic filtering, we have approximately 630K cells.
I have set up a basic Seurat V5 pipeline, which includes reading each library as a Seurat object, saving them into a list, and performing the following steps on each library one at a time: FindVariableFeatures, normalization, and sketching. After that, I proceed to the integration vignette.
However, the "FindIntegrationAnchors" step is taking an unusually long time—over three days. I am unsure if this duration is expected or if there are ways to optimize the following code to make the process faster. Any help is much appreciated.
`
Load data from h5 files
options(future.globals.maxSize = 110000 * 1024^2)
print("STage 3 Load data from h5 files ")
SampleNames_df <- data.frame(SampleNames)
seurat_object_list <- c()
I have read the libraries one at a time into a seurat object and saved them as RDS. Hence instead of reading it from h5 I am reading hte individual RDS files
for (i in 1:nrow(SampleNames_df)) {
print(i)
metadata_cols = c( "seq_folder","Patient","nUMI_RNA","nGene","percent_mt",
"percent_ribo","log10GenesPerUMI","cells",
"nUMI_RNA_1og10","nCount_RNA","nFeature_RNA","Tissue")
data_mat <- readRDS(paste0(SampleNames_df[i,"GEX_folders"],"RNA_ADT_normalized.RDS"))
data_mat <- RenameCells(object = data_mat,
add.cell.id = gsub("^GEX_FB|^GEX_|^SCAF_", "",
unique([email protected]$seq_folder)))
[email protected]$Patient <- SampleNames_df[i,"TumorCode"]
[email protected]$Tissue <- SampleNames_df[i,"Tissue"]
seurat_object_list[[i]] <- data_mat
}
#Sample cells using sketch data function and perform downstream analysis
print("Step 5: normalize and identify variable features for each dataset independently")
filtered_seurat.list <- lapply(X = seurat_object_list, FUN = function(x) {
print(unique([email protected]$seq_folder))
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2500)
x <- SketchData(x, n = 5000,method = "LeverageScore", sketched.assay = "sketch")
x <- NormalizeData(x)
})
saveRDS(filtered_seurat.list,"./filtered_seurat.list_Sketched.RDS")
print("Step 6: Combine the sketched libraries into a list and find integration anchors")
features <- SelectIntegrationFeatures(object.list = filtered_seurat.list)
filtered_seurat.list <- lapply(X = filtered_seurat.list, FUN = function(x) {
x <- ScaleData(x, features = features, verbose = FALSE)
x <- RunPCA(x, features = features, verbose = FALSE)
})
saveRDS(filtered_seurat.list,"./filtered_seurat.list_Sketched_scale_PCA.RDS")
print("Step 7: Perform integration using sketched cells using integration anchors")
anchors <- FindIntegrationAnchors(object.list = filtered_seurat.list,
anchor.features = features, reduction = "rpca")
print("Step 8: Make an integrated assay using sketched cells")
filt_seurat_sketch_inte <- IntegrateData(anchorset = anchors)
DefaultAssay(filt_seurat_sketch_inte) <- "integrated"
print("Step 9:# Run the standard workflow for visualization and clustering on the sketched")
filt_seurat_sketch_inte <- ScaleData(filt_seurat_sketch_inte,
verbose = FALSE)
filt_seurat_sketch_inte <- RunPCA(filt_seurat_sketch_inte,
npcs = 30, verbose = FALSE)
filt_seurat_sketch_inte <- RunUMAP(filt_seurat_sketch_inte,
reduction = "pca", dims = 1:30)
filt_seurat_sketch_inte <- FindNeighbors(filt_seurat_sketch_inte,
reduction = "pca", dims = 1:30)
filt_seurat_sketch_inte <- FindClusters(filt_seurat_sketch_inte,
resolution = 0.5)
saveRDS(filt_seurat_sketch_inte,"./filtered_seurat.integrated_sketched_cells_only.RDS")
print(" Visualization")
p1 <- DimPlot(filt_seurat_sketch_inte, reduction = "umap",
group.by = "Patient")
p2 <- DimPlot(filt_seurat_sketch_inte, reduction = "umap",
group.by = "seurat_annotations", label = TRUE,
repel = TRUE)
p1 + p2
pdf("UMAP_sketched_cells.pdf", height = 20, width = 30)
p1 + p2
dev.off()
print("Step 10:# Run the standard workflow ProjectIntegration on the sketched")
DefaultAssay(filt_seurat_sketch_inte) <- "rna"
Project_seurat_sketch_inte <- ProjectIntegration(object = filt_seurat_sketch_inte,
sketched.assay = "sketch", assay = "RNA",
reduction = "integrated.rpca")
saveRDS(filtered_seurat.list,"./filtered_seurat.ProjectIntegration_seurat_sketch_inte.RDS")
print("Step 11:# Run the standard workflow ProjectData on the sketched")
Project_seurat_sketch_inte <- ProjectData(object = Project_seurat_sketch_inte,
sketched.assay = "sketch", assay = "RNA",
sketched.reduction = "integrated.rpca.full",
full.reduction = "integrated.rpca.full", dims = 1:30,
refdata = list(celltype.full = "celltype.manual"))
saveRDS(filtered_seurat.list,"./filtered_seurat.ProjecData_seurat_sketch_inte.RDS")
print("Step 11:# Run the standard workflow RunUMAP on the sketched")
Project_seurat_sketch_inte <- RunUMAP(Project_seurat_sketch_inte,
reduction = "integrated.rpca.full", dims = 1:30,
reduction.name = "umap.full",reduction.key = "UMAP_full_")
Project_seurat_sketch_inte$celltype.full <- factor(Project_seurat_sketch_inte$celltype.full,
ordered = TRUE,
levels = sort(unique(as.numeric(filtered_seurat_inte$celltype.full))))
Idents(Project_seurat_sketch_inte) <- "celltype.full"
print(" Visualization")
p1 <- DimPlot(Project_seurat_sketch_inte, reduction = "integrated.rpca.full",
group.by = "Patient",raster=TRUE)
p2 <- DimPlot(Project_seurat_sketch_inte, reduction = "integrated.rpca.full",
group.by = "celltype.full",label = TRUE,repel = TRUE, raster=TRUE)
pdf("UMAP.pdf", height = 20, width = 30)
p1 + p2
dev.off()
print("Save the final RDS")
saveRDS(Project_seurat_sketch_inte,"./all_filtered_seurat_normalized_sketch_integrated.RDS")
`
The text was updated successfully, but these errors were encountered: