vignettes/PRECAST.DLPFC.Rmd
PRECAST.DLPFC.Rmd
This vignette introduces the PRECAST workflow for the analysis of single spatial transcriptomics dataset. The workflow consists of three steps
We demonstrate the use of PRECAST to one human dorsolateral prefrontal cortex Visium data that are here, which can be downloaded to the current working path by the following command:
githubURL <- "https://github.com/feiyoung/PRECAST/blob/main/vignettes_data/dlpfc_151672.rda?raw=true"
download.file(githubURL, "dlpfc_151672.rda", mode = "wb")
Then load to R
load("dlpfc_151672.rda")
The package can be loaded with the command:
First, we view the the spatial transcriptomics data with Visium platform.
dlpfc_151672 ## a list including two Seurat object
#> An object of class Seurat
#> 33538 features across 4015 samples within 1 assay
#> Active assay: RNA (33538 features, 0 variable features)
check the meta data that must include the spatial coordinates named “row” and “col”, respectively. If the names are not, they are required to rename them.
meta_data <- dlpfc_151672@meta.data
all(c("row", "col") %in% colnames(meta_data)) ## the names are correct!
#> [1] TRUE
head(meta_data[, c("row", "col")])
#> row col
#> AAACAAGTATCTCCCA-1 50 102
#> AAACACCAATAACTGC-1 59 19
#> AAACAGAGCGACTCCT-1 14 94
#> AAACAGCTTTCAGAAG-1 43 9
#> AAACAGGGTCTATATT-1 47 13
#> AAACATTTCCCGGATT-1 61 97
Create a PRECASTObj object to prepare for PRECAST models. Here, we only show the HVGs
method to select the 2000 highly variable genes, but users are able to choose more genes or also use the SPARK-X
to choose the spatially variable genes.
set.seed(2023)
library(PRECAST)
preobj <- CreatePRECASTObject(seuList = list(dlpfc_151672), selectGenesMethod = "HVGs", gene.number = 2000) #
## check the number of genes/features after filtering step
preobj@seulist
#> [[1]]
#> An object of class Seurat
#> 2000 features across 4015 samples within 1 assay
#> Active assay: RNA (2000 features, 2000 variable features)
## Add adjacency matrix list for a PRECASTObj object to prepare for PRECAST model fitting.
PRECASTObj <- AddAdjList(preobj, platform = "Visium")
## Add a model setting in advance for a PRECASTObj object. verbose =TRUE helps outputing the
## information in the algorithm.
PRECASTObj <- AddParSetting(PRECASTObj, Sigma_equal = FALSE, coreNum = 1, maxIter = 30, verbose = TRUE)
For function PRECAST
, users can specify the number of clusters \(K\) or set K
to be an integer vector by using modified BIC(MBIC) to determine \(K\). Here, we use user-specified number of clusters.
### Given K
PRECASTObj <- PRECAST(PRECASTObj, K = 7)
#> fitting ...
#>
|
| | 0%
|
|=================================== | 50%
|
|======================================================================| 100%
#> variable initialize finish!
#> predict Y and V!
#> diff Energy = 4.357544
#> Finish ICM step!
#> iter = 2, loglik= 720694.125000, dloglik=1.000336
#> predict Y and V!
#> diff Energy = 0.043552
#> Finish ICM step!
#> iter = 3, loglik= 736531.125000, dloglik=0.021975
#> predict Y and V!
#> diff Energy = 0.107110
#> Finish ICM step!
#> iter = 4, loglik= 742609.375000, dloglik=0.008253
#> predict Y and V!
#> diff Energy = 5.270334
#> Finish ICM step!
#> iter = 5, loglik= 745399.875000, dloglik=0.003758
#> predict Y and V!
#> diff Energy = 0.852004
#> Finish ICM step!
#> iter = 6, loglik= 747061.250000, dloglik=0.002229
#> predict Y and V!
#> diff Energy = 8.872446
#> Finish ICM step!
#> iter = 7, loglik= 748162.437500, dloglik=0.001474
#> predict Y and V!
#> diff Energy = 10.323227
#> Finish ICM step!
#> iter = 8, loglik= 748905.937500, dloglik=0.000994
#> predict Y and V!
#> diff Energy = 3.204909
#> Finish ICM step!
#> iter = 9, loglik= 749436.375000, dloglik=0.000708
#> predict Y and V!
#> diff Energy = 10.411241
#> Finish ICM step!
#> iter = 10, loglik= 749835.500000, dloglik=0.000533
#> predict Y and V!
#> diff Energy = 3.299855
#> Finish ICM step!
#> iter = 11, loglik= 750150.312500, dloglik=0.000420
#> predict Y and V!
#> diff Energy = 8.352915
#> Finish ICM step!
#> iter = 12, loglik= 750402.625000, dloglik=0.000336
#> predict Y and V!
#> diff Energy = 4.246870
#> Finish ICM step!
#> iter = 13, loglik= 750611.250000, dloglik=0.000278
#> predict Y and V!
#> diff Energy = 10.693586
#> Finish ICM step!
#> iter = 14, loglik= 750778.875000, dloglik=0.000223
#> predict Y and V!
#> diff Energy = 4.404526
#> Finish ICM step!
#> iter = 15, loglik= 750928.625000, dloglik=0.000199
#> predict Y and V!
#> diff Energy = 7.872126
#> Finish ICM step!
#> iter = 16, loglik= 751058.375000, dloglik=0.000173
#> predict Y and V!
#> diff Energy = 1.442411
#> Finish ICM step!
#> iter = 17, loglik= 751176.250000, dloglik=0.000157
#> predict Y and V!
#> diff Energy = 6.422405
#> Finish ICM step!
#> iter = 18, loglik= 751273.625000, dloglik=0.000130
#> predict Y and V!
#> diff Energy = 3.737015
#> Finish ICM step!
#> iter = 19, loglik= 751367.812500, dloglik=0.000125
#> predict Y and V!
#> diff Energy = 2.975575
#> Finish ICM step!
#> iter = 20, loglik= 751445.375000, dloglik=0.000103
#> predict Y and V!
#> diff Energy = 1.618512
#> Finish ICM step!
#> iter = 21, loglik= 751520.062500, dloglik=0.000099
#> predict Y and V!
#> diff Energy = 1.663198
#> Finish ICM step!
#> iter = 22, loglik= 751574.187500, dloglik=0.000072
#> predict Y and V!
#> diff Energy = 5.080723
#> Finish ICM step!
#> iter = 23, loglik= 751639.687500, dloglik=0.000087
#> predict Y and V!
#> diff Energy = 1.346914
#> Finish ICM step!
#> iter = 24, loglik= 751708.500000, dloglik=0.000092
#> predict Y and V!
#> diff Energy = 0.617952
#> Finish ICM step!
#> iter = 25, loglik= 751756.312500, dloglik=0.000064
#> predict Y and V!
#> diff Energy = 2.145776
#> Finish ICM step!
#> iter = 26, loglik= 751795.812500, dloglik=0.000053
#> predict Y and V!
#> diff Energy = 1.444635
#> Finish ICM step!
#> iter = 27, loglik= 751831.187500, dloglik=0.000047
#> predict Y and V!
#> diff Energy = 1.244554
#> Finish ICM step!
#> iter = 28, loglik= 751862.750000, dloglik=0.000042
#> predict Y and V!
#> diff Energy = 7.400049
#> Finish ICM step!
#> iter = 29, loglik= 751889.437500, dloglik=0.000035
#> predict Y and V!
#> diff Energy = 2.148208
#> Finish ICM step!
#> iter = 30, loglik= 751919.687500, dloglik=0.000040
Use the function SelectModel()
to re-organize the fitted results in PRECASTObj.
## backup the fitting results in resList
resList <- PRECASTObj@resList
PRECASTObj <- SelectModel(PRECASTObj)
ari_precast <- mclust::adjustedRandIndex(PRECASTObj@resList$cluster[[1]], PRECASTObj@seulist[[1]]$layer_guess_reordered)
Other options
Users are also able to set multiple K, then choose the best one
PRECASTObj2 <- AddParSetting(PRECASTObj, Sigma_equal = FALSE, coreNum = 4, maxIter = 30, verbose = TRUE) # set 4 cores to run in parallel.
PRECASTObj2 <- PRECAST(PRECASTObj2, K = 5:8)
## backup the fitting results in resList
resList2 <- PRECASTObj2@resList
PRECASTObj2 <- SelectModel(PRECASTObj2)
str(PRECASTObj2@resList)
mclust::adjustedRandIndex(PRECASTObj2@resList$cluster[[1]], PRECASTObj2@seulist[[1]]$layer_guess_reordered)
Besides, user can also use different initialization method by setting int.model
, for example, set int.model=NULL
; see the functions AddParSetting()
and model_set()
for more details.
Put the reults into a Seurat object seuInt.
seuInt <- PRECASTObj@seulist[[1]]
seuInt@meta.data$cluster <- factor(unlist(PRECASTObj@resList$cluster))
seuInt@meta.data$batch <- 1
seuInt <- Add_embed(PRECASTObj@resList$hZ[[1]], seuInt, embed_name = "PRECAST")
posList <- lapply(PRECASTObj@seulist, function(x) cbind(x$row, x$col))
seuInt <- Add_embed(posList[[1]], seuInt, embed_name = "position")
Idents(seuInt) <- factor(seuInt@meta.data$cluster)
seuInt
#> An object of class Seurat
#> 2000 features across 4015 samples within 1 assay
#> Active assay: RNA (2000 features, 2000 variable features)
#> 1 dimensional reduction calculated: PRECAST
## The low-dimensional embeddings obtained by PRECAST are saved in PRECAST reduction slot.
Save the spatial and tSNE scatter plots for clusters from PRECAST
p_sp1 <- SpaPlot(seuInt, item = "cluster", point_size = 3, combine = F)[[1]] + cowplot::theme_cowplot() +
ggplot2::ggtitle(paste0("PRECAST: ARI=", round(ari_precast, 2))) + ggplot2::xlab("row") + ggplot2::ylab("col")
seuInt <- AddTSNE(seuInt, n_comp = 2)
p_tsne <- dimPlot(seuInt, item = "cluster")
p_tsne <- p_tsne + cowplot::theme_cowplot() + ggplot2::ggtitle("PRECAST")
Fit DR-SC and Plot the spatial and tSNE scatter plots for clusters
seu_drsc <- DR.SC::DR.SC(PRECASTObj@seulist[[1]], K = 7, verbose = T)
#> iter = 2, loglik= 731518.206960, dloglik=1.000341
#> iter = 3, loglik= 738726.830540, dloglik=0.009854
#> iter = 4, loglik= 741042.145388, dloglik=0.003134
#> iter = 5, loglik= 742243.453375, dloglik=0.001621
#> iter = 6, loglik= 743007.759324, dloglik=0.001030
#> iter = 7, loglik= 743526.931419, dloglik=0.000699
#> iter = 8, loglik= 743896.859337, dloglik=0.000498
#> iter = 9, loglik= 744181.669833, dloglik=0.000383
#> iter = 10, loglik= 744422.250825, dloglik=0.000323
#> iter = 11, loglik= 744622.719658, dloglik=0.000269
#> iter = 12, loglik= 744794.842008, dloglik=0.000231
#> iter = 13, loglik= 744951.307556, dloglik=0.000210
#> iter = 14, loglik= 745086.270102, dloglik=0.000181
#> iter = 15, loglik= 745197.659202, dloglik=0.000149
#> iter = 16, loglik= 745308.735367, dloglik=0.000149
#> iter = 17, loglik= 745407.013373, dloglik=0.000132
#> iter = 18, loglik= 745494.350941, dloglik=0.000117
#> iter = 19, loglik= 745568.379036, dloglik=0.000099
#> iter = 20, loglik= 745634.241643, dloglik=0.000088
#> iter = 21, loglik= 745702.910177, dloglik=0.000092
#> iter = 22, loglik= 745756.835958, dloglik=0.000072
#> iter = 23, loglik= 745802.778415, dloglik=0.000062
#> iter = 24, loglik= 745849.510135, dloglik=0.000063
#> iter = 25, loglik= 745886.892357, dloglik=0.000050
ari_drsc <- mclust::adjustedRandIndex(seu_drsc$spatial.drsc.cluster, PRECASTObj@seulist[[1]]$layer_guess_reordered)
p_tsne_drsc <- DR.SC::drscPlot(seu_drsc)
p_tsne_drsc <- p_tsne_drsc + ggplot2::ggtitle("DR-SC")
p_sp2 <- DR.SC::spatialPlotClusters(seu_drsc) + cowplot::theme_cowplot() + ggplot2::ggtitle(paste0("DR-SC ARI=",
round(ari_drsc, 2)))
Compare the clustering performance of PRECAST and DR-SC.
Compare the tSNE visualiztion performance of PRECAST and DR-SC.
Session Info
sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22621)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Chinese (Simplified)_China.936
#> [2] LC_CTYPE=Chinese (Simplified)_China.936
#> [3] LC_MONETARY=Chinese (Simplified)_China.936
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=Chinese (Simplified)_China.936
#>
#> attached base packages:
#> [1] parallel stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] sp_1.5-0 SeuratObject_4.1.0 Seurat_4.1.1 PRECAST_1.6
#> [5] gtools_3.9.2.2
#>
#> loaded via a namespace (and not attached):
#> [1] utf8_1.2.3 reticulate_1.25
#> [3] tidyselect_1.1.2 htmlwidgets_1.5.4
#> [5] grid_4.1.2 BiocParallel_1.28.3
#> [7] Rtsne_0.16 munsell_0.5.0
#> [9] ScaledMatrix_1.2.0 codetools_0.2-18
#> [11] ragg_1.2.2 ica_1.0-2
#> [13] future_1.26.1 miniUI_0.1.1.1
#> [15] withr_2.5.0 spatstat.random_2.2-0
#> [17] colorspace_2.1-0 progressr_0.10.1
#> [19] Biobase_2.54.0 highr_0.9
#> [21] knitr_1.37 rstudioapi_0.13
#> [23] stats4_4.1.2 SingleCellExperiment_1.16.0
#> [25] ROCR_1.0-11 ggsignif_0.6.3
#> [27] tensor_1.5 listenv_0.8.0
#> [29] labeling_0.4.2 MatrixGenerics_1.6.0
#> [31] GenomeInfoDbData_1.2.7 polyclip_1.10-0
#> [33] farver_2.1.1 rprojroot_2.0.3
#> [35] parallelly_1.32.0 vctrs_0.6.1
#> [37] generics_0.1.2 xfun_0.29
#> [39] ggthemes_4.2.4 R6_2.5.1
#> [41] GenomeInfoDb_1.30.1 ggbeeswarm_0.6.0
#> [43] rsvd_1.0.5 bitops_1.0-7
#> [45] spatstat.utils_3.0-1 cachem_1.0.6
#> [47] DelayedArray_0.20.0 assertthat_0.2.1
#> [49] promises_1.2.0.1 scales_1.2.1
#> [51] rgeos_0.5-9 beeswarm_0.4.0
#> [53] gtable_0.3.3 beachmat_2.10.0
#> [55] globals_0.15.0 goftest_1.2-3
#> [57] rlang_1.1.0 systemfonts_1.0.4
#> [59] splines_4.1.2 rstatix_0.7.0
#> [61] lazyeval_0.2.2 broom_0.7.12
#> [63] spatstat.geom_2.4-0 yaml_2.3.6
#> [65] reshape2_1.4.4 abind_1.4-5
#> [67] backports_1.4.1 httpuv_1.6.5
#> [69] tools_4.1.2 ggplot2_3.4.1
#> [71] ellipsis_0.3.2 spatstat.core_2.4-4
#> [73] jquerylib_0.1.4 RColorBrewer_1.1-3
#> [75] BiocGenerics_0.40.0 ggridges_0.5.3
#> [77] Rcpp_1.0.10 plyr_1.8.7
#> [79] sparseMatrixStats_1.6.0 zlibbioc_1.40.0
#> [81] purrr_0.3.4 RCurl_1.98-1.6
#> [83] ggpubr_0.4.0 rpart_4.1.16
#> [85] deldir_1.0-6 viridis_0.6.2
#> [87] pbapply_1.5-0 cowplot_1.1.1
#> [89] S4Vectors_0.32.3 zoo_1.8-10
#> [91] SummarizedExperiment_1.24.0 ggrepel_0.9.1
#> [93] cluster_2.1.2 fs_1.5.2
#> [95] magrittr_2.0.3 GiRaF_1.0.1
#> [97] data.table_1.14.2 scattermore_0.8
#> [99] lmtest_0.9-40 RANN_2.6.1
#> [101] fitdistrplus_1.1-8 matrixStats_0.62.0
#> [103] patchwork_1.1.1 mime_0.12
#> [105] evaluate_0.15 xtable_1.8-4
#> [107] mclust_5.4.10 IRanges_2.28.0
#> [109] gridExtra_2.3 compiler_4.1.2
#> [111] scater_1.25.1 tibble_3.2.1
#> [113] KernSmooth_2.23-20 crayon_1.5.1
#> [115] htmltools_0.5.2 mgcv_1.8-39
#> [117] later_1.3.0 tidyr_1.2.0
#> [119] DBI_1.1.2 formatR_1.11
#> [121] MASS_7.3-55 car_3.0-12
#> [123] Matrix_1.4-0 cli_3.2.0
#> [125] igraph_1.3.5 DR.SC_3.2
#> [127] GenomicRanges_1.46.1 pkgconfig_2.0.3
#> [129] pkgdown_2.0.6 plotly_4.10.0
#> [131] scuttle_1.4.0 spatstat.sparse_2.1-1
#> [133] vipor_0.4.5 bslib_0.3.1
#> [135] XVector_0.34.0 CompQuadForm_1.4.3
#> [137] stringr_1.4.0 digest_0.6.29
#> [139] sctransform_0.3.3 RcppAnnoy_0.0.19
#> [141] spatstat.data_3.0-0 rmarkdown_2.11
#> [143] leiden_0.4.2 uwot_0.1.11
#> [145] DelayedMatrixStats_1.16.0 shiny_1.7.1
#> [147] lifecycle_1.0.3 nlme_3.1-155
#> [149] jsonlite_1.8.0 carData_3.0-5
#> [151] BiocNeighbors_1.12.0 desc_1.4.0
#> [153] viridisLite_0.4.1 fansi_1.0.4
#> [155] pillar_1.9.0 lattice_0.20-45
#> [157] fastmap_1.1.0 httr_1.4.3
#> [159] survival_3.2-13 glue_1.6.2
#> [161] png_0.1-7 stringi_1.7.6
#> [163] sass_0.4.1 textshaping_0.3.6
#> [165] BiocSingular_1.10.0 memoise_2.0.1
#> [167] dplyr_1.0.9 irlba_2.3.5
#> [169] future.apply_1.9.0