Taking long time to Integrate 110 scRNAseq libraries using Sketch and FindIntegrationAnchors functions. #9610

sentisci · 2025-01-08T16:29:51Z

Discussed in #9606

^{Originally posted by sentisci January 7, 2025}
Hi,

I am working on integrating a large number of single-cell RNA sequencing (scRNA-seq) libraries—about 110 in total—with varying read coverage and cell counts. After performing basic filtering, we have approximately 630K cells.

I have set up a basic Seurat V5 pipeline, which includes reading each library as a Seurat object, saving them into a list, and performing the following steps on each library one at a time: FindVariableFeatures, normalization, and sketching. After that, I proceed to the integration vignette.

However, the "FindIntegrationAnchors" step is taking an unusually long time—over three days. I am unsure if this duration is expected or if there are ways to optimize the following code to make the process faster. Any help is much appreciated.
`

Load data from h5 files

options(future.globals.maxSize = 110000 * 1024^2)

print("STage 3 Load data from h5 files ")

SampleNames_df <- data.frame(SampleNames)
seurat_object_list <- c()

I have read the libraries one at a time into a seurat object and saved them as RDS. Hence instead of reading it from h5 I am reading hte individual RDS files

for (i in 1:nrow(SampleNames_df)) {

print(i)
metadata_cols = c( "seq_folder","Patient","nUMI_RNA","nGene","percent_mt",
"percent_ribo","log10GenesPerUMI","cells",
"nUMI_RNA_1og10","nCount_RNA","nFeature_RNA","Tissue")

data_mat <- readRDS(paste0(SampleNames_df[i,"GEX_folders"],"RNA_ADT_normalized.RDS"))
data_mat <- RenameCells(object = data_mat,
add.cell.id = gsub("^GEX_FB|^GEX_|^SCAF_", "",
unique([email protected]$seq_folder)))
[email protected]$Patient <- SampleNames_df[i,"TumorCode"]
[email protected]$Tissue <- SampleNames_df[i,"Tissue"]

seurat_object_list[[i]] <- data_mat
}

#Sample cells using sketch data function and perform downstream analysis

print("Step 5: normalize and identify variable features for each dataset independently")
filtered_seurat.list <- lapply(X = seurat_object_list, FUN = function(x) {
print(unique([email protected]$seq_folder))
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2500)
x <- SketchData(x, n = 5000,method = "LeverageScore", sketched.assay = "sketch")
x <- NormalizeData(x)

})

saveRDS(filtered_seurat.list,"./filtered_seurat.list_Sketched.RDS")

print("Step 6: Combine the sketched libraries into a list and find integration anchors")
features <- SelectIntegrationFeatures(object.list = filtered_seurat.list)
filtered_seurat.list <- lapply(X = filtered_seurat.list, FUN = function(x) {
x <- ScaleData(x, features = features, verbose = FALSE)
x <- RunPCA(x, features = features, verbose = FALSE)
})

saveRDS(filtered_seurat.list,"./filtered_seurat.list_Sketched_scale_PCA.RDS")

print("Step 7: Perform integration using sketched cells using integration anchors")
anchors <- FindIntegrationAnchors(object.list = filtered_seurat.list,
anchor.features = features, reduction = "rpca")

print("Step 8: Make an integrated assay using sketched cells")
filt_seurat_sketch_inte <- IntegrateData(anchorset = anchors)

DefaultAssay(filt_seurat_sketch_inte) <- "integrated"

print("Step 9:# Run the standard workflow for visualization and clustering on the sketched")

filt_seurat_sketch_inte <- ScaleData(filt_seurat_sketch_inte,
verbose = FALSE)
filt_seurat_sketch_inte <- RunPCA(filt_seurat_sketch_inte,
npcs = 30, verbose = FALSE)
filt_seurat_sketch_inte <- RunUMAP(filt_seurat_sketch_inte,
reduction = "pca", dims = 1:30)
filt_seurat_sketch_inte <- FindNeighbors(filt_seurat_sketch_inte,
reduction = "pca", dims = 1:30)
filt_seurat_sketch_inte <- FindClusters(filt_seurat_sketch_inte,
resolution = 0.5)

saveRDS(filt_seurat_sketch_inte,"./filtered_seurat.integrated_sketched_cells_only.RDS")

print(" Visualization")
p1 <- DimPlot(filt_seurat_sketch_inte, reduction = "umap",
group.by = "Patient")
p2 <- DimPlot(filt_seurat_sketch_inte, reduction = "umap",
group.by = "seurat_annotations", label = TRUE,
repel = TRUE)
p1 + p2
pdf("UMAP_sketched_cells.pdf", height = 20, width = 30)
p1 + p2
dev.off()

print("Step 10:# Run the standard workflow ProjectIntegration on the sketched")
DefaultAssay(filt_seurat_sketch_inte) <- "rna"

Project_seurat_sketch_inte <- ProjectIntegration(object = filt_seurat_sketch_inte,
sketched.assay = "sketch", assay = "RNA",
reduction = "integrated.rpca")

saveRDS(filtered_seurat.list,"./filtered_seurat.ProjectIntegration_seurat_sketch_inte.RDS")

print("Step 11:# Run the standard workflow ProjectData on the sketched")
Project_seurat_sketch_inte <- ProjectData(object = Project_seurat_sketch_inte,
sketched.assay = "sketch", assay = "RNA",
sketched.reduction = "integrated.rpca.full",
full.reduction = "integrated.rpca.full", dims = 1:30,
refdata = list(celltype.full = "celltype.manual"))
saveRDS(filtered_seurat.list,"./filtered_seurat.ProjecData_seurat_sketch_inte.RDS")

print("Step 11:# Run the standard workflow RunUMAP on the sketched")
Project_seurat_sketch_inte <- RunUMAP(Project_seurat_sketch_inte,
reduction = "integrated.rpca.full", dims = 1:30,
reduction.name = "umap.full",reduction.key = "UMAP_full_")

Project_seurat_sketch_inte$celltype.full <- factor(Project_seurat_sketch_inte$celltype.full,
ordered = TRUE,
levels = sort(unique(as.numeric(filtered_seurat_inte$celltype.full))))

Idents(Project_seurat_sketch_inte) <- "celltype.full"

print(" Visualization")
p1 <- DimPlot(Project_seurat_sketch_inte, reduction = "integrated.rpca.full",
group.by = "Patient",raster=TRUE)
p2 <- DimPlot(Project_seurat_sketch_inte, reduction = "integrated.rpca.full",
group.by = "celltype.full",label = TRUE,repel = TRUE, raster=TRUE)

pdf("UMAP.pdf", height = 20, width = 30)
p1 + p2
dev.off()

print("Save the final RDS")
saveRDS(Project_seurat_sketch_inte,"./all_filtered_seurat_normalized_sketch_integrated.RDS")
`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Taking long time to Integrate 110 scRNAseq libraries using Sketch and FindIntegrationAnchors functions. #9610

Taking long time to Integrate 110 scRNAseq libraries using Sketch and FindIntegrationAnchors functions. #9610

sentisci commented Jan 8, 2025

Load data from h5 files

I have read the libraries one at a time into a seurat object and saved them as RDS. Hence instead of reading it from h5 I am reading hte individual RDS files

Taking long time to Integrate 110 scRNAseq libraries using Sketch and FindIntegrationAnchors functions. #9610

Taking long time to Integrate 110 scRNAseq libraries using Sketch and FindIntegrationAnchors functions. #9610

Comments

sentisci commented Jan 8, 2025

Discussed in #9606

Load data from h5 files

I have read the libraries one at a time into a seurat object and saved them as RDS. Hence instead of reading it from h5 I am reading hte individual RDS files