This vignette introduces the PRECAST workflow for the analysis of integrating multiple spatial transcriptomics datasets. The workflow consists of three steps

  • Independent preprocessing and model setting
  • Probabilistic embedding, clustering and alignment using PRECAST model
  • Downstream analysis (i.e. visualization of clusters and embeddings, combined differential expression analysis)

We demonstrate the use of PRECAST to two sliced human breast cancer Visium data that are here, which can be downloaded to the current working path by the following command:

githubURL <- "https://github.com/feiyoung/PRECAST/blob/main/vignettes_data/bc2.rda?raw=true"
download.file(githubURL, "bc2.rda", mode = "wb")

Then load to R

load("bc2.rda")

This data is also available at 10X genomics data website:

Users require the two folders for each dataset: spatial and filtered_feature_bc_matrix. Then the data can be read by the following commond.

# library(DR.SC) dir.file <- 'Section' ## the folders Section1 and Section2, and each includes
# two folders spatial and filtered_feature_bc_matrix seuList <- list() for (r in 1:2) {
# message('r = ', r) seuList[[r]] <- read10XVisium(paste0(dir.file, r)) } bc2 <- seuList

The package can be loaded with the command:

Fit PRECAST using this data

View human breast cancer Visium data from DataPRECAST

bc2  ## a list including two Seurat object
#> [[1]]
#> An object of class Seurat 
#> 36601 features across 3798 samples within 1 assay 
#> Active assay: RNA (36601 features, 0 variable features)
#> 
#> [[2]]
#> An object of class Seurat 
#> 36601 features across 3987 samples within 1 assay 
#> Active assay: RNA (36601 features, 0 variable features)

Check the content in bc2

head(bc2[[1]])
#>                       orig.ident nCount_RNA nFeature_RNA               spot
#> CACGATTGGTCGTTAA-1 SeuratProject       6853         3026 CACGATTGGTCGTTAA-1
#> GGTTGTATCGTGAAAT-1 SeuratProject       8679         3646 GGTTGTATCGTGAAAT-1
#> TCTTATGGGTAGTACC-1 SeuratProject       5093         2449 TCTTATGGGTAGTACC-1
#> TACAAGCTGTTCACTG-1 SeuratProject       9432         3708 TACAAGCTGTTCACTG-1
#> GTATCTTGTTGCTCAC-1 SeuratProject       8610         3512 GTATCTTGTTGCTCAC-1
#> ATACCAGGTGAGCGAT-1 SeuratProject       9974         3901 ATACCAGGTGAGCGAT-1
#> CCTAAACAGGGTCCGT-1 SeuratProject       7232         3173 CCTAAACAGGGTCCGT-1
#> ATGGTGCTCAAAGCCA-1 SeuratProject       6025         2867 ATGGTGCTCAAAGCCA-1
#> CAAATGCGGAGTGTTC-1 SeuratProject       8822         3678 CAAATGCGGAGTGTTC-1
#> CGTGCCCGACATTTGT-1 SeuratProject       6364         2932 CGTGCCCGACATTTGT-1
#>                    in_tissue row col imagerow imagecol
#> CACGATTGGTCGTTAA-1         1   0  50     4046    10350
#> GGTTGTATCGTGAAAT-1         1   1  51     4284    10486
#> TCTTATGGGTAGTACC-1         1   0  52     4047    10623
#> TACAAGCTGTTCACTG-1         1   1  53     4285    10759
#> GTATCTTGTTGCTCAC-1         1   0  54     4047    10896
#> ATACCAGGTGAGCGAT-1         1   1  55     4285    11032
#> CCTAAACAGGGTCCGT-1         1   0  56     4048    11169
#> ATGGTGCTCAAAGCCA-1         1   1  57     4286    11305
#> CAAATGCGGAGTGTTC-1         1   0  58     4048    11442
#> CGTGCCCGACATTTGT-1         1   1  59     4286    11578

Prepare the PRECASTObject.

Human breast cancer data have been pre-processed and saved as the PRECASTObj format.

## Create PRECASTObject.
set.seed(2022)
PRECASTObj <- CreatePRECASTObject(bc2)

Add the model setting

## check the number of genes/features after filtering step
PRECASTObj@seulist
#> [[1]]
#> An object of class Seurat 
#> 2000 features across 3798 samples within 1 assay 
#> Active assay: RNA (2000 features, 1699 variable features)
#> 
#> [[2]]
#> An object of class Seurat 
#> 2000 features across 3987 samples within 1 assay 
#> Active assay: RNA (2000 features, 1572 variable features)

## Add adjacency matrix list for a PRECASTObj object to prepare for PRECAST model fitting.
PRECASTObj <- AddAdjList(PRECASTObj, platform = "Visium")

## Add a model setting in advance for a PRECASTObj object. verbose =TRUE helps outputing the
## information in the algorithm.
PRECASTObj <- AddParSetting(PRECASTObj, Sigma_equal = FALSE, verbose = TRUE, int.model = NULL)

Fit PRECAST

For function PRECAST, users can specify the number of clusters \(K\) or set K to be an integer vector by using modified BIC(MBIC) to determine \(K\). First, we try using user-specified number of clusters. For convenience, we give the selected number of clusters by MBIC (K=14).

### Given K
PRECASTObj <- PRECAST(PRECASTObj, K = 14)
#> fitting ...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |======================================================================| 100%
#> variable initialize finish! 
#> predict Y and V! 
#> Finish ICM step! 
#> iter = 2, loglik= -538710.062500, dloglik=0.999749 
#> predict Y and V! 
#> diff Energy = 1.204197 
#> diff Energy = 0.877895 
#> Finish ICM step! 
#> iter = 3, loglik= -495364.531250, dloglik=0.080462 
#> predict Y and V! 
#> diff Energy = 2.324026 
#> diff Energy = 7.982584 
#> Finish ICM step! 
#> iter = 4, loglik= -475213.562500, dloglik=0.040679 
#> predict Y and V! 
#> diff Energy = 0.648297 
#> Finish ICM step! 
#> iter = 5, loglik= -464792.156250, dloglik=0.021930 
#> predict Y and V! 
#> diff Energy = 3.508402 
#> diff Energy = 0.904503 
#> Finish ICM step! 
#> iter = 6, loglik= -458507.812500, dloglik=0.013521 
#> predict Y and V! 
#> diff Energy = 0.684931 
#> Finish ICM step! 
#> iter = 7, loglik= -454906.812500, dloglik=0.007854 
#> predict Y and V! 
#> Finish ICM step! 
#> iter = 8, loglik= -452424.218750, dloglik=0.005457 
#> predict Y and V! 
#> diff Energy = 3.822610 
#> diff Energy = 0.299825 
#> Finish ICM step! 
#> iter = 9, loglik= -450869.906250, dloglik=0.003436 
#> predict Y and V! 
#> diff Energy = 0.751252 
#> Finish ICM step! 
#> iter = 10, loglik= -449685.062500, dloglik=0.002628 
#> predict Y and V! 
#> diff Energy = 0.487209 
#> Finish ICM step! 
#> iter = 11, loglik= -448862.968750, dloglik=0.001828 
#> predict Y and V! 
#> diff Energy = 8.570347 
#> Finish ICM step! 
#> iter = 12, loglik= -448206.750000, dloglik=0.001462 
#> predict Y and V! 
#> diff Energy = 1.446884 
#> Finish ICM step! 
#> iter = 13, loglik= -447728.593750, dloglik=0.001067 
#> predict Y and V! 
#> diff Energy = 9.361322 
#> diff Energy = 0.618086 
#> Finish ICM step! 
#> iter = 14, loglik= -447314.156250, dloglik=0.000926 
#> predict Y and V! 
#> diff Energy = 4.770555 
#> diff Energy = 0.958841 
#> Finish ICM step! 
#> iter = 15, loglik= -446979.437500, dloglik=0.000748 
#> predict Y and V! 
#> diff Energy = 8.616039 
#> Finish ICM step! 
#> iter = 16, loglik= -446690.750000, dloglik=0.000646 
#> predict Y and V! 
#> diff Energy = 4.927937 
#> diff Energy = 2.274673 
#> Finish ICM step! 
#> iter = 17, loglik= -446446.562500, dloglik=0.000547 
#> predict Y and V! 
#> diff Energy = 6.601925 
#> Finish ICM step! 
#> iter = 18, loglik= -446228.156250, dloglik=0.000489 
#> predict Y and V! 
#> diff Energy = 5.054816 
#> Finish ICM step! 
#> iter = 19, loglik= -446018.343750, dloglik=0.000470 
#> predict Y and V! 
#> diff Energy = 0.942972 
#> diff Energy = 0.532784 
#> Finish ICM step! 
#> iter = 20, loglik= -445887.406250, dloglik=0.000294

Select a best model

## backup the fitting results in resList
resList <- PRECASTObj@resList
PRECASTObj <- selectModel(PRECASTObj)

Integrate the two samples by the function IntegrateSpaData.

seuInt <- IntegrateSpaData(PRECASTObj, species = "Human")
seuInt
#> An object of class Seurat 
#> 2000 features across 7785 samples within 1 assay 
#> Active assay: PRE_CAST (2000 features, 0 variable features)
#>  2 dimensional reductions calculated: PRECAST, position
## The low-dimensional embeddings obtained by PRECAST are saved in PRECAST reduction slot.

Show the spatial scatter plot for clusters

cols_cluster <- c("#FD7446", "#709AE1", "#31A354", "#9EDAE5", "#DE9ED6", "#BCBD22", "#CE6DBD", "#DADAEB",
    "yellow", "#FF9896", "#91D1C2", "#C7E9C0", "#6B6ECF", "#7B4173")
p12 <- SpaPlot(seuInt, batch = NULL, point_size = 1, cols = cols_cluster, combine = TRUE)
p12

# users can plot each sample by setting combine=FALSE

Show the spatial UMAP/tNSE RGB plot to illustrate the performance in extracting features.

seuInt <- AddUMAP(seuInt)
p13 <- SpaPlot(seuInt, batch = NULL, item = "RGB_UMAP", point_size = 2, combine = TRUE, text_size = 15)
p13

# seuInt <- AddTSNE(seuInt) SpaPlot(seuInt, batch=NULL,item='RGB_TSNE',point_size=2,
# combine=T, text_size=15)

Show the tSNE plot based on the extracted features from PRECAST to check the performance of integration.

seuInt <- AddTSNE(seuInt, n_comp = 2)
library(patchwork)
p1 <- dimPlot(seuInt, item = "cluster", point_size = 0.5, font_family = "serif", cols = cols_cluster)  # Times New Roman
p2 <- dimPlot(seuInt, item = "batch", point_size = 0.5, font_family = "serif")
p1 + p2

Combined differential expression analysis

library(Seurat)
dat_deg <- FindAllMarkers(seuInt)
library(dplyr)
n <- 10
dat_deg %>%
    group_by(cluster) %>%
    top_n(n = n, wt = avg_log2FC) -> top10

seuInt <- ScaleData(seuInt)
seus <- subset(seuInt, downsample = 400)

Plot DE genes’ heatmap for each spatial domains identified by PRECAST.

color_id <- as.numeric(levels(Idents(seus)))
library(ggplot2)
## HeatMap
p1 <- doHeatmap(seus, features = top10$gene, cell_label = "Domain", grp_label = F, grp_color = cols_cluster[color_id],
    pt_size = 6, slot = "scale.data") + theme(legend.text = element_text(size = 10), legend.title = element_text(size = 13,
    face = "bold"), axis.text.y = element_text(size = 5, face = "italic", family = "serif"))
p1

Session information

sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_Singapore.1252  LC_CTYPE=English_Singapore.1252   
#> [3] LC_MONETARY=English_Singapore.1252 LC_NUMERIC=C                      
#> [5] LC_TIME=English_Singapore.1252    
#> system code page: 936
#> 
#> attached base packages:
#> [1] parallel  stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] ggplot2_3.3.6      dplyr_1.0.9        patchwork_1.1.1    sp_1.5-0          
#> [5] SeuratObject_4.1.0 Seurat_4.1.1       PRECAST_1.3        gtools_3.9.2.2    
#> 
#> loaded via a namespace (and not attached):
#>   [1] utf8_1.2.2                  reticulate_1.25            
#>   [3] tidyselect_1.1.2            htmlwidgets_1.5.4          
#>   [5] grid_4.1.2                  BiocParallel_1.28.3        
#>   [7] Rtsne_0.16                  munsell_0.5.0              
#>   [9] ScaledMatrix_1.2.0          codetools_0.2-18           
#>  [11] ragg_1.2.2                  ica_1.0-2                  
#>  [13] future_1.26.1               miniUI_0.1.1.1             
#>  [15] withr_2.5.0                 spatstat.random_2.2-0      
#>  [17] colorspace_2.0-3            progressr_0.10.1           
#>  [19] Biobase_2.54.0              highr_0.9                  
#>  [21] knitr_1.37                  rstudioapi_0.13            
#>  [23] stats4_4.1.2                SingleCellExperiment_1.16.0
#>  [25] ROCR_1.0-11                 tensor_1.5                 
#>  [27] listenv_0.8.0               labeling_0.4.2             
#>  [29] MatrixGenerics_1.6.0        GenomeInfoDbData_1.2.7     
#>  [31] polyclip_1.10-0             farver_2.1.0               
#>  [33] rprojroot_2.0.3             parallelly_1.32.0          
#>  [35] vctrs_0.4.1                 generics_0.1.2             
#>  [37] xfun_0.29                   ggthemes_4.2.4             
#>  [39] R6_2.5.1                    GenomeInfoDb_1.30.1        
#>  [41] ggbeeswarm_0.6.0            rsvd_1.0.5                 
#>  [43] bitops_1.0-7                spatstat.utils_2.3-1       
#>  [45] cachem_1.0.6                DelayedArray_0.20.0        
#>  [47] assertthat_0.2.1            promises_1.2.0.1           
#>  [49] scales_1.2.0                rgeos_0.5-9                
#>  [51] beeswarm_0.4.0              gtable_0.3.0               
#>  [53] beachmat_2.10.0             globals_0.15.0             
#>  [55] goftest_1.2-3               rlang_1.0.2                
#>  [57] systemfonts_1.0.4           splines_4.1.2              
#>  [59] lazyeval_0.2.2              spatstat.geom_2.4-0        
#>  [61] yaml_2.3.5                  reshape2_1.4.4             
#>  [63] abind_1.4-5                 httpuv_1.6.5               
#>  [65] tools_4.1.2                 ellipsis_0.3.2             
#>  [67] spatstat.core_2.4-4         jquerylib_0.1.4            
#>  [69] RColorBrewer_1.1-3          BiocGenerics_0.40.0        
#>  [71] ggridges_0.5.3              Rcpp_1.0.8.3               
#>  [73] plyr_1.8.7                  sparseMatrixStats_1.6.0    
#>  [75] zlibbioc_1.40.0             purrr_0.3.4                
#>  [77] RCurl_1.98-1.6              rpart_4.1.16               
#>  [79] deldir_1.0-6                viridis_0.6.2              
#>  [81] pbapply_1.5-0               cowplot_1.1.1              
#>  [83] S4Vectors_0.32.3            zoo_1.8-10                 
#>  [85] SummarizedExperiment_1.24.0 ggrepel_0.9.1              
#>  [87] cluster_2.1.2               fs_1.5.2                   
#>  [89] magrittr_2.0.3              GiRaF_1.0.1                
#>  [91] data.table_1.14.2           scattermore_0.8            
#>  [93] lmtest_0.9-40               RANN_2.6.1                 
#>  [95] fitdistrplus_1.1-8          matrixStats_0.62.0         
#>  [97] mime_0.12                   evaluate_0.15              
#>  [99] xtable_1.8-4                mclust_5.4.10              
#> [101] IRanges_2.28.0              gridExtra_2.3              
#> [103] compiler_4.1.2              scater_1.25.1              
#> [105] tibble_3.1.7                KernSmooth_2.23-20         
#> [107] crayon_1.5.1                htmltools_0.5.2            
#> [109] mgcv_1.8-39                 later_1.3.0                
#> [111] tidyr_1.2.0                 DBI_1.1.2                  
#> [113] formatR_1.11                MASS_7.3-55                
#> [115] Matrix_1.4-0                cli_3.2.0                  
#> [117] igraph_1.3.2                DR.SC_3.0                  
#> [119] GenomicRanges_1.46.1        pkgconfig_2.0.3            
#> [121] pkgdown_2.0.2               plotly_4.10.0              
#> [123] scuttle_1.4.0               spatstat.sparse_2.1-1      
#> [125] vipor_0.4.5                 bslib_0.3.1                
#> [127] XVector_0.34.0              CompQuadForm_1.4.3         
#> [129] stringr_1.4.0               digest_0.6.29              
#> [131] sctransform_0.3.3           RcppAnnoy_0.0.19           
#> [133] spatstat.data_2.2-0         rmarkdown_2.11             
#> [135] leiden_0.4.2                uwot_0.1.11                
#> [137] DelayedMatrixStats_1.16.0   shiny_1.7.1                
#> [139] lifecycle_1.0.1             nlme_3.1-155               
#> [141] jsonlite_1.8.0              BiocNeighbors_1.12.0       
#> [143] limma_3.50.1                desc_1.4.0                 
#> [145] viridisLite_0.4.0           fansi_1.0.3                
#> [147] pillar_1.7.0                lattice_0.20-45            
#> [149] fastmap_1.1.0               httr_1.4.3                 
#> [151] survival_3.2-13             glue_1.6.2                 
#> [153] png_0.1-7                   stringi_1.7.6              
#> [155] sass_0.4.1                  textshaping_0.3.6          
#> [157] BiocSingular_1.10.0         memoise_2.0.1              
#> [159] irlba_2.3.5                 future.apply_1.9.0