vignettes/PRECAST.BreastCancer.Rmd
PRECAST.BreastCancer.Rmd
This vignette introduces the PRECAST workflow for the analysis of integrating multiple spatial transcriptomics datasets. The workflow consists of three steps
We demonstrate the use of PRECAST to two sliced human breast cancer Visium data that are here, which can be downloaded to the current working path by the following command:
githubURL <- "https://github.com/feiyoung/PRECAST/blob/main/vignettes_data/bc2.rda?raw=true"
download.file(githubURL, "bc2.rda", mode = "wb")
Then load to R
load("bc2.rda")
This data is also available at 10X genomics data website:
Users require the two folders for each dataset: spatial and filtered_feature_bc_matrix. Then the data can be read by the following commond.
# library(DR.SC) dir.file <- 'Section' ## the folders Section1 and Section2, and each includes
# two folders spatial and filtered_feature_bc_matrix seuList <- list() for (r in 1:2) {
# message('r = ', r) seuList[[r]] <- read10XVisium(paste0(dir.file, r)) } bc2 <- seuList
The package can be loaded with the command:
View human breast cancer Visium data from DataPRECAST
bc2 ## a list including two Seurat object
#> [[1]]
#> An object of class Seurat
#> 36601 features across 3798 samples within 1 assay
#> Active assay: RNA (36601 features, 0 variable features)
#>
#> [[2]]
#> An object of class Seurat
#> 36601 features across 3987 samples within 1 assay
#> Active assay: RNA (36601 features, 0 variable features)
Check the content in bc2
head(bc2[[1]])
#> orig.ident nCount_RNA nFeature_RNA spot
#> CACGATTGGTCGTTAA-1 SeuratProject 6853 3026 CACGATTGGTCGTTAA-1
#> GGTTGTATCGTGAAAT-1 SeuratProject 8679 3646 GGTTGTATCGTGAAAT-1
#> TCTTATGGGTAGTACC-1 SeuratProject 5093 2449 TCTTATGGGTAGTACC-1
#> TACAAGCTGTTCACTG-1 SeuratProject 9432 3708 TACAAGCTGTTCACTG-1
#> GTATCTTGTTGCTCAC-1 SeuratProject 8610 3512 GTATCTTGTTGCTCAC-1
#> ATACCAGGTGAGCGAT-1 SeuratProject 9974 3901 ATACCAGGTGAGCGAT-1
#> CCTAAACAGGGTCCGT-1 SeuratProject 7232 3173 CCTAAACAGGGTCCGT-1
#> ATGGTGCTCAAAGCCA-1 SeuratProject 6025 2867 ATGGTGCTCAAAGCCA-1
#> CAAATGCGGAGTGTTC-1 SeuratProject 8822 3678 CAAATGCGGAGTGTTC-1
#> CGTGCCCGACATTTGT-1 SeuratProject 6364 2932 CGTGCCCGACATTTGT-1
#> in_tissue row col imagerow imagecol
#> CACGATTGGTCGTTAA-1 1 0 50 4046 10350
#> GGTTGTATCGTGAAAT-1 1 1 51 4284 10486
#> TCTTATGGGTAGTACC-1 1 0 52 4047 10623
#> TACAAGCTGTTCACTG-1 1 1 53 4285 10759
#> GTATCTTGTTGCTCAC-1 1 0 54 4047 10896
#> ATACCAGGTGAGCGAT-1 1 1 55 4285 11032
#> CCTAAACAGGGTCCGT-1 1 0 56 4048 11169
#> ATGGTGCTCAAAGCCA-1 1 1 57 4286 11305
#> CAAATGCGGAGTGTTC-1 1 0 58 4048 11442
#> CGTGCCCGACATTTGT-1 1 1 59 4286 11578
Human breast cancer data have been pre-processed and saved as the PRECASTObj
format.
## Create PRECASTObject.
set.seed(2022)
PRECASTObj <- CreatePRECASTObject(bc2)
## check the number of genes/features after filtering step
PRECASTObj@seulist
#> [[1]]
#> An object of class Seurat
#> 2000 features across 3798 samples within 1 assay
#> Active assay: RNA (2000 features, 1699 variable features)
#>
#> [[2]]
#> An object of class Seurat
#> 2000 features across 3987 samples within 1 assay
#> Active assay: RNA (2000 features, 1572 variable features)
## Add adjacency matrix list for a PRECASTObj object to prepare for PRECAST model fitting.
PRECASTObj <- AddAdjList(PRECASTObj, platform = "Visium")
## Add a model setting in advance for a PRECASTObj object. verbose =TRUE helps outputing the
## information in the algorithm.
PRECASTObj <- AddParSetting(PRECASTObj, Sigma_equal = FALSE, verbose = TRUE, int.model = NULL)
For function PRECAST
, users can specify the number of clusters \(K\) or set K
to be an integer vector by using modified BIC(MBIC) to determine \(K\). First, we try using user-specified number of clusters. For convenience, we give the selected number of clusters by MBIC (K=14).
### Given K
PRECASTObj <- PRECAST(PRECASTObj, K = 14)
#> fitting ...
#>
|
| | 0%
|
|===== | 7%
|
|========= | 13%
|
|============== | 20%
|
|=================== | 27%
|
|======================= | 33%
|
|============================ | 40%
|
|================================= | 47%
|
|===================================== | 53%
|
|========================================== | 60%
|
|=============================================== | 67%
|
|=================================================== | 73%
|
|======================================================== | 80%
|
|============================================================= | 87%
|
|================================================================= | 93%
|
|======================================================================| 100%
#> variable initialize finish!
#> predict Y and V!
#> Finish ICM step!
#> iter = 2, loglik= -538710.062500, dloglik=0.999749
#> predict Y and V!
#> diff Energy = 1.204197
#> diff Energy = 0.877895
#> Finish ICM step!
#> iter = 3, loglik= -495364.531250, dloglik=0.080462
#> predict Y and V!
#> diff Energy = 2.324026
#> diff Energy = 7.982584
#> Finish ICM step!
#> iter = 4, loglik= -475213.562500, dloglik=0.040679
#> predict Y and V!
#> diff Energy = 0.648297
#> Finish ICM step!
#> iter = 5, loglik= -464792.156250, dloglik=0.021930
#> predict Y and V!
#> diff Energy = 3.508402
#> diff Energy = 0.904503
#> Finish ICM step!
#> iter = 6, loglik= -458507.812500, dloglik=0.013521
#> predict Y and V!
#> diff Energy = 0.684931
#> Finish ICM step!
#> iter = 7, loglik= -454906.812500, dloglik=0.007854
#> predict Y and V!
#> Finish ICM step!
#> iter = 8, loglik= -452424.218750, dloglik=0.005457
#> predict Y and V!
#> diff Energy = 3.822610
#> diff Energy = 0.299825
#> Finish ICM step!
#> iter = 9, loglik= -450869.906250, dloglik=0.003436
#> predict Y and V!
#> diff Energy = 0.751252
#> Finish ICM step!
#> iter = 10, loglik= -449685.062500, dloglik=0.002628
#> predict Y and V!
#> diff Energy = 0.487209
#> Finish ICM step!
#> iter = 11, loglik= -448862.968750, dloglik=0.001828
#> predict Y and V!
#> diff Energy = 8.570347
#> Finish ICM step!
#> iter = 12, loglik= -448206.750000, dloglik=0.001462
#> predict Y and V!
#> diff Energy = 1.446884
#> Finish ICM step!
#> iter = 13, loglik= -447728.593750, dloglik=0.001067
#> predict Y and V!
#> diff Energy = 9.361322
#> diff Energy = 0.618086
#> Finish ICM step!
#> iter = 14, loglik= -447314.156250, dloglik=0.000926
#> predict Y and V!
#> diff Energy = 4.770555
#> diff Energy = 0.958841
#> Finish ICM step!
#> iter = 15, loglik= -446979.437500, dloglik=0.000748
#> predict Y and V!
#> diff Energy = 8.616039
#> Finish ICM step!
#> iter = 16, loglik= -446690.750000, dloglik=0.000646
#> predict Y and V!
#> diff Energy = 4.927937
#> diff Energy = 2.274673
#> Finish ICM step!
#> iter = 17, loglik= -446446.562500, dloglik=0.000547
#> predict Y and V!
#> diff Energy = 6.601925
#> Finish ICM step!
#> iter = 18, loglik= -446228.156250, dloglik=0.000489
#> predict Y and V!
#> diff Energy = 5.054816
#> Finish ICM step!
#> iter = 19, loglik= -446018.343750, dloglik=0.000470
#> predict Y and V!
#> diff Energy = 0.942972
#> diff Energy = 0.532784
#> Finish ICM step!
#> iter = 20, loglik= -445887.406250, dloglik=0.000294
Select a best model
## backup the fitting results in resList
resList <- PRECASTObj@resList
PRECASTObj <- selectModel(PRECASTObj)
Integrate the two samples by the function IntegrateSpaData
.
seuInt <- IntegrateSpaData(PRECASTObj, species = "Human")
seuInt
#> An object of class Seurat
#> 2000 features across 7785 samples within 1 assay
#> Active assay: PRE_CAST (2000 features, 0 variable features)
#> 2 dimensional reductions calculated: PRECAST, position
## The low-dimensional embeddings obtained by PRECAST are saved in PRECAST reduction slot.
Show the spatial scatter plot for clusters
cols_cluster <- c("#FD7446", "#709AE1", "#31A354", "#9EDAE5", "#DE9ED6", "#BCBD22", "#CE6DBD", "#DADAEB",
"yellow", "#FF9896", "#91D1C2", "#C7E9C0", "#6B6ECF", "#7B4173")
p12 <- SpaPlot(seuInt, batch = NULL, point_size = 1, cols = cols_cluster, combine = TRUE)
p12
# users can plot each sample by setting combine=FALSE
Show the spatial UMAP/tNSE RGB plot to illustrate the performance in extracting features.
seuInt <- AddUMAP(seuInt)
p13 <- SpaPlot(seuInt, batch = NULL, item = "RGB_UMAP", point_size = 2, combine = TRUE, text_size = 15)
p13
# seuInt <- AddTSNE(seuInt) SpaPlot(seuInt, batch=NULL,item='RGB_TSNE',point_size=2,
# combine=T, text_size=15)
Show the tSNE plot based on the extracted features from PRECAST to check the performance of integration.
seuInt <- AddTSNE(seuInt, n_comp = 2)
library(patchwork)
p1 <- dimPlot(seuInt, item = "cluster", point_size = 0.5, font_family = "serif", cols = cols_cluster) # Times New Roman
p2 <- dimPlot(seuInt, item = "batch", point_size = 0.5, font_family = "serif")
p1 + p2
Combined differential expression analysis
library(Seurat)
dat_deg <- FindAllMarkers(seuInt)
library(dplyr)
n <- 10
dat_deg %>%
group_by(cluster) %>%
top_n(n = n, wt = avg_log2FC) -> top10
seuInt <- ScaleData(seuInt)
seus <- subset(seuInt, downsample = 400)
Plot DE genes’ heatmap for each spatial domains identified by PRECAST.
color_id <- as.numeric(levels(Idents(seus)))
library(ggplot2)
## HeatMap
p1 <- doHeatmap(seus, features = top10$gene, cell_label = "Domain", grp_label = F, grp_color = cols_cluster[color_id],
pt_size = 6, slot = "scale.data") + theme(legend.text = element_text(size = 10), legend.title = element_text(size = 13,
face = "bold"), axis.text.y = element_text(size = 5, face = "italic", family = "serif"))
p1
sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_Singapore.1252 LC_CTYPE=English_Singapore.1252
#> [3] LC_MONETARY=English_Singapore.1252 LC_NUMERIC=C
#> [5] LC_TIME=English_Singapore.1252
#> system code page: 936
#>
#> attached base packages:
#> [1] parallel stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] ggplot2_3.3.6 dplyr_1.0.9 patchwork_1.1.1 sp_1.5-0
#> [5] SeuratObject_4.1.0 Seurat_4.1.1 PRECAST_1.3 gtools_3.9.2.2
#>
#> loaded via a namespace (and not attached):
#> [1] utf8_1.2.2 reticulate_1.25
#> [3] tidyselect_1.1.2 htmlwidgets_1.5.4
#> [5] grid_4.1.2 BiocParallel_1.28.3
#> [7] Rtsne_0.16 munsell_0.5.0
#> [9] ScaledMatrix_1.2.0 codetools_0.2-18
#> [11] ragg_1.2.2 ica_1.0-2
#> [13] future_1.26.1 miniUI_0.1.1.1
#> [15] withr_2.5.0 spatstat.random_2.2-0
#> [17] colorspace_2.0-3 progressr_0.10.1
#> [19] Biobase_2.54.0 highr_0.9
#> [21] knitr_1.37 rstudioapi_0.13
#> [23] stats4_4.1.2 SingleCellExperiment_1.16.0
#> [25] ROCR_1.0-11 tensor_1.5
#> [27] listenv_0.8.0 labeling_0.4.2
#> [29] MatrixGenerics_1.6.0 GenomeInfoDbData_1.2.7
#> [31] polyclip_1.10-0 farver_2.1.0
#> [33] rprojroot_2.0.3 parallelly_1.32.0
#> [35] vctrs_0.4.1 generics_0.1.2
#> [37] xfun_0.29 ggthemes_4.2.4
#> [39] R6_2.5.1 GenomeInfoDb_1.30.1
#> [41] ggbeeswarm_0.6.0 rsvd_1.0.5
#> [43] bitops_1.0-7 spatstat.utils_2.3-1
#> [45] cachem_1.0.6 DelayedArray_0.20.0
#> [47] assertthat_0.2.1 promises_1.2.0.1
#> [49] scales_1.2.0 rgeos_0.5-9
#> [51] beeswarm_0.4.0 gtable_0.3.0
#> [53] beachmat_2.10.0 globals_0.15.0
#> [55] goftest_1.2-3 rlang_1.0.2
#> [57] systemfonts_1.0.4 splines_4.1.2
#> [59] lazyeval_0.2.2 spatstat.geom_2.4-0
#> [61] yaml_2.3.5 reshape2_1.4.4
#> [63] abind_1.4-5 httpuv_1.6.5
#> [65] tools_4.1.2 ellipsis_0.3.2
#> [67] spatstat.core_2.4-4 jquerylib_0.1.4
#> [69] RColorBrewer_1.1-3 BiocGenerics_0.40.0
#> [71] ggridges_0.5.3 Rcpp_1.0.8.3
#> [73] plyr_1.8.7 sparseMatrixStats_1.6.0
#> [75] zlibbioc_1.40.0 purrr_0.3.4
#> [77] RCurl_1.98-1.6 rpart_4.1.16
#> [79] deldir_1.0-6 viridis_0.6.2
#> [81] pbapply_1.5-0 cowplot_1.1.1
#> [83] S4Vectors_0.32.3 zoo_1.8-10
#> [85] SummarizedExperiment_1.24.0 ggrepel_0.9.1
#> [87] cluster_2.1.2 fs_1.5.2
#> [89] magrittr_2.0.3 GiRaF_1.0.1
#> [91] data.table_1.14.2 scattermore_0.8
#> [93] lmtest_0.9-40 RANN_2.6.1
#> [95] fitdistrplus_1.1-8 matrixStats_0.62.0
#> [97] mime_0.12 evaluate_0.15
#> [99] xtable_1.8-4 mclust_5.4.10
#> [101] IRanges_2.28.0 gridExtra_2.3
#> [103] compiler_4.1.2 scater_1.25.1
#> [105] tibble_3.1.7 KernSmooth_2.23-20
#> [107] crayon_1.5.1 htmltools_0.5.2
#> [109] mgcv_1.8-39 later_1.3.0
#> [111] tidyr_1.2.0 DBI_1.1.2
#> [113] formatR_1.11 MASS_7.3-55
#> [115] Matrix_1.4-0 cli_3.2.0
#> [117] igraph_1.3.2 DR.SC_3.0
#> [119] GenomicRanges_1.46.1 pkgconfig_2.0.3
#> [121] pkgdown_2.0.2 plotly_4.10.0
#> [123] scuttle_1.4.0 spatstat.sparse_2.1-1
#> [125] vipor_0.4.5 bslib_0.3.1
#> [127] XVector_0.34.0 CompQuadForm_1.4.3
#> [129] stringr_1.4.0 digest_0.6.29
#> [131] sctransform_0.3.3 RcppAnnoy_0.0.19
#> [133] spatstat.data_2.2-0 rmarkdown_2.11
#> [135] leiden_0.4.2 uwot_0.1.11
#> [137] DelayedMatrixStats_1.16.0 shiny_1.7.1
#> [139] lifecycle_1.0.1 nlme_3.1-155
#> [141] jsonlite_1.8.0 BiocNeighbors_1.12.0
#> [143] limma_3.50.1 desc_1.4.0
#> [145] viridisLite_0.4.0 fansi_1.0.3
#> [147] pillar_1.7.0 lattice_0.20-45
#> [149] fastmap_1.1.0 httr_1.4.3
#> [151] survival_3.2-13 glue_1.6.2
#> [153] png_0.1-7 stringi_1.7.6
#> [155] sass_0.4.1 textshaping_0.3.6
#> [157] BiocSingular_1.10.0 memoise_2.0.1
#> [159] irlba_2.3.5 future.apply_1.9.0