Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taking long time to Integrate 110 scRNAseq libraries using Sketch and FindIntegrationAnchors functions. #9610

Open
sentisci opened this issue Jan 8, 2025 Discussed in #9606 · 0 comments

Comments

@sentisci
Copy link

sentisci commented Jan 8, 2025

Discussed in #9606

Originally posted by sentisci January 7, 2025
Hi,

I am working on integrating a large number of single-cell RNA sequencing (scRNA-seq) libraries—about 110 in total—with varying read coverage and cell counts. After performing basic filtering, we have approximately 630K cells.

I have set up a basic Seurat V5 pipeline, which includes reading each library as a Seurat object, saving them into a list, and performing the following steps on each library one at a time: FindVariableFeatures, normalization, and sketching. After that, I proceed to the integration vignette.

However, the "FindIntegrationAnchors" step is taking an unusually long time—over three days. I am unsure if this duration is expected or if there are ways to optimize the following code to make the process faster. Any help is much appreciated.
`

Load data from h5 files

options(future.globals.maxSize = 110000 * 1024^2)

print("STage 3 Load data from h5 files ")

SampleNames_df <- data.frame(SampleNames)
seurat_object_list <- c()

I have read the libraries one at a time into a seurat object and saved them as RDS. Hence instead of reading it from h5 I am reading hte individual RDS files

for (i in 1:nrow(SampleNames_df)) {

print(i)
metadata_cols = c( "seq_folder","Patient","nUMI_RNA","nGene","percent_mt",
"percent_ribo","log10GenesPerUMI","cells",
"nUMI_RNA_1og10","nCount_RNA","nFeature_RNA","Tissue")

data_mat <- readRDS(paste0(SampleNames_df[i,"GEX_folders"],"RNA_ADT_normalized.RDS"))
data_mat <- RenameCells(object = data_mat,
add.cell.id = gsub("^GEX_FB
|^GEX_|^SCAF_", "",
unique([email protected]$seq_folder)))
[email protected]$Patient <- SampleNames_df[i,"TumorCode"]
[email protected]$Tissue <- SampleNames_df[i,"Tissue"]

seurat_object_list[[i]] <- data_mat
}

#Sample cells using sketch data function and perform downstream analysis

print("Step 5: normalize and identify variable features for each dataset independently")
filtered_seurat.list <- lapply(X = seurat_object_list, FUN = function(x) {
print(unique([email protected]$seq_folder))
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2500)
x <- SketchData(x, n = 5000,method = "LeverageScore", sketched.assay = "sketch")
x <- NormalizeData(x)

})

saveRDS(filtered_seurat.list,"./filtered_seurat.list_Sketched.RDS")

print("Step 6: Combine the sketched libraries into a list and find integration anchors")
features <- SelectIntegrationFeatures(object.list = filtered_seurat.list)
filtered_seurat.list <- lapply(X = filtered_seurat.list, FUN = function(x) {
x <- ScaleData(x, features = features, verbose = FALSE)
x <- RunPCA(x, features = features, verbose = FALSE)
})

saveRDS(filtered_seurat.list,"./filtered_seurat.list_Sketched_scale_PCA.RDS")

print("Step 7: Perform integration using sketched cells using integration anchors")
anchors <- FindIntegrationAnchors(object.list = filtered_seurat.list,
anchor.features = features, reduction = "rpca")

print("Step 8: Make an integrated assay using sketched cells")
filt_seurat_sketch_inte <- IntegrateData(anchorset = anchors)

DefaultAssay(filt_seurat_sketch_inte) <- "integrated"

print("Step 9:# Run the standard workflow for visualization and clustering on the sketched")

filt_seurat_sketch_inte <- ScaleData(filt_seurat_sketch_inte,
verbose = FALSE)
filt_seurat_sketch_inte <- RunPCA(filt_seurat_sketch_inte,
npcs = 30, verbose = FALSE)
filt_seurat_sketch_inte <- RunUMAP(filt_seurat_sketch_inte,
reduction = "pca", dims = 1:30)
filt_seurat_sketch_inte <- FindNeighbors(filt_seurat_sketch_inte,
reduction = "pca", dims = 1:30)
filt_seurat_sketch_inte <- FindClusters(filt_seurat_sketch_inte,
resolution = 0.5)

saveRDS(filt_seurat_sketch_inte,"./filtered_seurat.integrated_sketched_cells_only.RDS")

print(" Visualization")
p1 <- DimPlot(filt_seurat_sketch_inte, reduction = "umap",
group.by = "Patient")
p2 <- DimPlot(filt_seurat_sketch_inte, reduction = "umap",
group.by = "seurat_annotations", label = TRUE,
repel = TRUE)
p1 + p2
pdf("UMAP_sketched_cells.pdf", height = 20, width = 30)
p1 + p2
dev.off()

print("Step 10:# Run the standard workflow ProjectIntegration on the sketched")
DefaultAssay(filt_seurat_sketch_inte) <- "rna"

Project_seurat_sketch_inte <- ProjectIntegration(object = filt_seurat_sketch_inte,
sketched.assay = "sketch", assay = "RNA",
reduction = "integrated.rpca")

saveRDS(filtered_seurat.list,"./filtered_seurat.ProjectIntegration_seurat_sketch_inte.RDS")

print("Step 11:# Run the standard workflow ProjectData on the sketched")
Project_seurat_sketch_inte <- ProjectData(object = Project_seurat_sketch_inte,
sketched.assay = "sketch", assay = "RNA",
sketched.reduction = "integrated.rpca.full",
full.reduction = "integrated.rpca.full", dims = 1:30,
refdata = list(celltype.full = "celltype.manual"))
saveRDS(filtered_seurat.list,"./filtered_seurat.ProjecData_seurat_sketch_inte.RDS")

print("Step 11:# Run the standard workflow RunUMAP on the sketched")
Project_seurat_sketch_inte <- RunUMAP(Project_seurat_sketch_inte,
reduction = "integrated.rpca.full", dims = 1:30,
reduction.name = "umap.full",reduction.key = "UMAP_full_")

Project_seurat_sketch_inte$celltype.full <- factor(Project_seurat_sketch_inte$celltype.full,
ordered = TRUE,
levels = sort(unique(as.numeric(filtered_seurat_inte$celltype.full))))

Idents(Project_seurat_sketch_inte) <- "celltype.full"

print(" Visualization")
p1 <- DimPlot(Project_seurat_sketch_inte, reduction = "integrated.rpca.full",
group.by = "Patient",raster=TRUE)
p2 <- DimPlot(Project_seurat_sketch_inte, reduction = "integrated.rpca.full",
group.by = "celltype.full",label = TRUE,repel = TRUE, raster=TRUE)

pdf("UMAP.pdf", height = 20, width = 30)
p1 + p2
dev.off()

print("Save the final RDS")
saveRDS(Project_seurat_sketch_inte,"./all_filtered_seurat_normalized_sketch_integrated.RDS")
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant