PRECAST: simulation

This vignette introduces the PRECAST workflow for the analysis of integrating multiple spatial transcriptomics dataset. The workflow consists of three steps

Independent preprocessing and model setting
Probabilistic embedding, clustering and alignment using PRECAST model
Downstream analysis (i.e. visualization of clusters and embeddings, combined differential expression analysis)

We demonstrate the use of PRECAST to three simulated Visium data that are here, which can be downloaded to the current working path by the following command:

githubURL <- "https://github.com/feiyoung/PRECAST/blob/main/vignettes_data/data_simu.rda?raw=true"
download.file(githubURL,"data_simu.rda",mode='wb')

Then load to R

load("data_simu.rda")

The package can be loaded with the command:

library(PRECAST)
#> Loading required package: parallel
#> Loading required package: gtools
#> PRECAST :  An efficient data integration method is provided for multiple spatial transcriptomics data with non-cluster-relevant effects such as the complex batch effects. It unifies spatial factor analysis simultaneously with spatial clustering and embedding alignment, requiring only partially shared cell/domain clusters across datasets. More details can be referred to Wei Liu, et al. (2023) <doi:10.1038/s41467-023-35947-w>.   Check out our Package website (https://feiyoung.github.io/PRECAST/index.html) for a more complete description of the methods and analyses
library(Seurat)
#> Warning: package 'Seurat' was built under R version 4.1.3
#> Attaching SeuratObject
#> Attaching sp

Load the simulated data

First, we view the the three simulated spatial transcriptomics data with Visium platform.

data_simu ## a list including three Seurat object with default assay: RNA
#> [[1]]
#> An object of class Seurat 
#> 2000 features across 4226 samples within 1 assay 
#> Active assay: RNA (2000 features, 0 variable features)
#> 
#> [[2]]
#> An object of class Seurat 
#> 2000 features across 3661 samples within 1 assay 
#> Active assay: RNA (2000 features, 0 variable features)
#> 
#> [[3]]
#> An object of class Seurat 
#> 2000 features across 3639 samples within 1 assay 
#> Active assay: RNA (2000 features, 0 variable features)

Check the content in data_simu.

head(data_simu[[1]])

row.names(data_simu[[1]])[1:10]

Create a PRECASTObject object

We show how to create a PRECASTObject object step by step. First, we create a Seurat list object using the count matrix and meta data of each data batch. Although data_simu is a prepared Seurat list object, we re-create a same objcet seuList to show the details.

Note: the spatial coordinates must be contained in the meta data and named as row and col, which benefits the identification of spaital coordinates by PRECAST.

## Get the gene-by-spot read count matrices
countList <- lapply(data_simu, function(x) x[["RNA"]]@counts)

## Check the spatial coordinates: Yes, they are named as "row" and "col"!
head(data_simu[[1]]@meta.data)
#>          orig.ident nCount_RNA nFeature_RNA row col sample true_cluster
#> S1_spot1         S1     179800         1117   0  16     S1            1
#> S1_spot2         S1     371165         1125  50 102     S1            3
#> S1_spot3         S1     753086         1108   3  43     S1            1
#> S1_spot4         S1     133468         1173  59  19     S1            7
#> S1_spot5         S1     127748         1113  43   9     S1            6
#> S1_spot6         S1     107114         1131  47  13     S1            6

## Get the meta data of each spot for each data batch
metadataList <- lapply(data_simu, function(x) x@meta.data)


## ensure the row.names of metadata in metaList are the same as that of colnames count matrix in countList
M <- length(countList)
for(r in 1:M){
  row.names(metadataList[[r]]) <- colnames(countList[[r]])
}


## Create the Seurat list  object

seuList <- list()
for(r in 1:M){
  seuList[[r]] <- CreateSeuratObject(counts = countList[[r]], meta.data=metadataList[[r]], project = "PRECASTsimu")
}

Prepare the PRECASTObject with preprocessing step.

Next, we use CreatePRECASTObject() to create a PRECASTObject based on the Seurat list object seuList. This function will do three things:

1. Filter low-quality spots and genes, controlled by the arguments premin.features and premin.spots, respectively; the spots are retained in raw data (seuList) with at least premin.features number of nonzero-count features (genes), and the genes are retained in raw data (seuList) with at least premin.spots number of spots. To ease presentation, we denote the filtered Seurat list object as data_filter1.
1. Select the top 2,000 variable genes (by setting gene.number=2000) for each data batch using FindSVGs() function in DR.SC package for spatially variable genes or FindVariableFeatures() function in Seurat package for highly variable genes. Next, we prioritized genes based on the number of times they were selected as variable genes in all samples and chose the top 2,000 genes. Then denote the Seurat list object as data_filter2, where only 2,000 genes are retained.
1. Conduct strict quality control for data_filter2 by filtering spots and genes, controlled by the arguments postmin.features and postmin.spots, respectively; the spots are retained with at least post.features nonzero counts across genes; the features (genes) are retained with at least postmin.spots number of nonzero-count spots. Usually, no genes are filltered because these genes are variable genes.

If the argument customGenelist is not NULL, then this function only does (3) based on customGenelist gene list.

In this simulated dataset, we don’t require to select genes, thus, we set customGenelist=row.names(seuList[[1]]), representing the user-defined gene list. User can retain the raw seurat list object by setting rawData.preserve = TRUE.


## Create PRECASTObject
set.seed(2022)
PRECASTObj <-  CreatePRECASTObject(seuList, customGenelist=row.names(seuList[[1]]))
#> Filter spots and features from Raw count data...
#>  
#> 
#> CreatePRECASTObject: remove genes:gene252  gene644  gene1235  gene1488  with low count reads in seuList.
#> Filter spots and features from SVGs(HVGs) count data...

## User can retain the raw seuList by the following commond.
##  PRECASTObj <-  CreatePRECASTObject(seuList, customGenelist=row.names(seuList[[1]]), rawData.preserve = TRUE)

Fit PRECAST using simulated data

Add the model setting

Add adjacency matrix list and parameter setting of PRECAST. More model setting parameters can be found in model_set().

## check the number of genes/features after filtering step
PRECASTObj@seulist
#> [[1]]
#> An object of class Seurat 
#> 1996 features across 4226 samples within 1 assay 
#> Active assay: RNA (1996 features, 0 variable features)
#> 
#> [[2]]
#> An object of class Seurat 
#> 1996 features across 3661 samples within 1 assay 
#> Active assay: RNA (1996 features, 0 variable features)
#> 
#> [[3]]
#> An object of class Seurat 
#> 1996 features across 3639 samples within 1 assay 
#> Active assay: RNA (1996 features, 0 variable features)

## seuList is null since the default value `rawData.preserve` is FALSE.
PRECASTObj@seuList
#> NULL

## Add adjacency matrix list for a PRECASTObj object to prepare for PRECAST model fitting.
PRECASTObj <-  AddAdjList(PRECASTObj, platform = "Visium")
#> Neighbors were identified for 4226 out of 4226 spots.
#> Neighbors were identified for 3658 out of 3661 spots.
#> Neighbors were identified for 3638 out of 3639 spots.

## Add a model setting in advance for a PRECASTObj object: verbose =TRUE helps outputing the information in the algorithm; coreNum set the how many cores are used in PRECAST. If you run PRECAST for multiple number of clusters, you can set multiple cores; otherwise, set it to 1. 
PRECASTObj <- AddParSetting(PRECASTObj, Sigma_equal=FALSE, maxIter=30, verbose=TRUE,
                             coreNum =1)

Fit PRECAST

For function PRECAST, users can specify the number of clusters \(K\) or set K to be an integer vector by using modified BIC(MBIC) to determine \(K\). For convenience, we give a single K here.

### Given K
set.seed(2022)
PRECASTObj <- PRECAST(PRECASTObj, K=7)
#> Intergrative data info.: 3 samples, 1996 genes X 11526 spots------
#> PRECAST model setting: error_heter=TRUE, Sigma_equal=FALSE, Sigma_diag=TRUE, mix_prop_heter=TRUE
#> Start computing intial values...
#> fitting ...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%
#> ----Fitting PRECAST model----------------
#> variable initialize finish! 
#> predict Y and V! 
#> Finish ICM step! 
#> iter = 2, loglik= 9719468.000000, dloglik=1.004526 
#> predict Y and V! 
#> diff Energy = 0.006069 
#> Finish ICM step! 
#> iter = 3, loglik= 10164475.000000, dloglik=0.045785 
#> predict Y and V! 
#> diff Energy = 20.659333 
#> diff Energy = 0.004862 
#> diff Energy = 5.052295 
#> Finish ICM step! 
#> iter = 4, loglik= 10207532.000000, dloglik=0.004236 
#> predict Y and V! 
#> diff Energy = 10.501516 
#> diff Energy = 2.049250 
#> diff Energy = 1.903215 
#> Finish ICM step! 
#> iter = 5, loglik= 10227480.000000, dloglik=0.001954 
#> predict Y and V! 
#> diff Energy = 9.237242 
#> diff Energy = 4.634927 
#> diff Energy = 3.484822 
#> Finish ICM step! 
#> iter = 6, loglik= 10238996.000000, dloglik=0.001126 
#> predict Y and V! 
#> diff Energy = 1.707339 
#> diff Energy = 1.681608 
#> diff Energy = 3.646628 
#> Finish ICM step! 
#> iter = 7, loglik= 10245930.000000, dloglik=0.000677 
#> predict Y and V! 
#> diff Energy = 2.308845 
#> diff Energy = 4.058903 
#> diff Energy = 2.885230 
#> Finish ICM step! 
#> iter = 8, loglik= 10250788.000000, dloglik=0.000474 
#> predict Y and V! 
#> diff Energy = 6.490700 
#> diff Energy = 4.139849 
#> diff Energy = 4.814719 
#> Finish ICM step! 
#> iter = 9, loglik= 10254287.000000, dloglik=0.000341 
#> predict Y and V! 
#> diff Energy = 5.705531 
#> diff Energy = 9.523470 
#> diff Energy = 5.677776 
#> Finish ICM step! 
#> iter = 10, loglik= 10257033.000000, dloglik=0.000268 
#> predict Y and V! 
#> diff Energy = 6.708684 
#> diff Energy = 0.188796 
#> diff Energy = 5.392532 
#> Finish ICM step! 
#> iter = 11, loglik= 10259326.000000, dloglik=0.000224 
#> predict Y and V! 
#> diff Energy = 3.429478 
#> diff Energy = 0.984060 
#> diff Energy = 6.407946 
#> Finish ICM step! 
#> iter = 12, loglik= 10261334.000000, dloglik=0.000196 
#> predict Y and V! 
#> diff Energy = 1.855086 
#> diff Energy = 5.314149 
#> diff Energy = 2.846484 
#> Finish ICM step! 
#> iter = 13, loglik= 10263226.000000, dloglik=0.000184 
#> predict Y and V! 
#> diff Energy = 1.494579 
#> diff Energy = 0.438796 
#> diff Energy = 3.401270 
#> Finish ICM step! 
#> iter = 14, loglik= 10265158.000000, dloglik=0.000188 
#> predict Y and V! 
#> diff Energy = 11.342860 
#> diff Energy = 2.291280 
#> diff Energy = 1.545057 
#> Finish ICM step! 
#> iter = 15, loglik= 10267070.000000, dloglik=0.000186 
#> predict Y and V! 
#> diff Energy = 2.709225 
#> diff Energy = 0.063746 
#> diff Energy = 0.072490 
#> Finish ICM step! 
#> iter = 16, loglik= 10269118.000000, dloglik=0.000199 
#> predict Y and V! 
#> diff Energy = 5.622516 
#> diff Energy = 1.090269 
#> diff Energy = 2.002848 
#> Finish ICM step! 
#> iter = 17, loglik= 10270980.000000, dloglik=0.000181 
#> predict Y and V! 
#> diff Energy = 3.716381 
#> diff Energy = 3.652518 
#> diff Energy = 1.892329 
#> Finish ICM step! 
#> iter = 18, loglik= 10272679.000000, dloglik=0.000165 
#> predict Y and V! 
#> diff Energy = 8.171563 
#> diff Energy = 0.957167 
#> diff Energy = 0.894462 
#> Finish ICM step! 
#> iter = 19, loglik= 10273944.000000, dloglik=0.000123 
#> predict Y and V! 
#> diff Energy = 5.731109 
#> diff Energy = 0.911039 
#> diff Energy = 2.370751 
#> Finish ICM step! 
#> iter = 20, loglik= 10275015.000000, dloglik=0.000104 
#> predict Y and V! 
#> diff Energy = 7.285983 
#> diff Energy = 0.871756 
#> diff Energy = 2.097420 
#> Finish ICM step! 
#> iter = 21, loglik= 10276038.000000, dloglik=0.000100 
#> predict Y and V! 
#> diff Energy = 8.421813 
#> diff Energy = 0.834700 
#> diff Energy = 1.281597 
#> Finish ICM step! 
#> iter = 22, loglik= 10276963.000000, dloglik=0.000090 
#> predict Y and V! 
#> diff Energy = 11.606312 
#> diff Energy = 0.798970 
#> diff Energy = 0.732148 
#> Finish ICM step! 
#> iter = 23, loglik= 10277890.000000, dloglik=0.000090 
#> predict Y and V! 
#> diff Energy = 6.431231 
#> diff Energy = 0.761082 
#> diff Energy = 1.249812 
#> Finish ICM step! 
#> iter = 24, loglik= 10278838.000000, dloglik=0.000092 
#> predict Y and V! 
#> diff Energy = 12.920729 
#> diff Energy = 0.727401 
#> diff Energy = 1.142387 
#> Finish ICM step! 
#> iter = 25, loglik= 10279785.000000, dloglik=0.000092 
#> predict Y and V! 
#> diff Energy = 1.803717 
#> diff Energy = 0.685884 
#> diff Energy = 5.081954 
#> Finish ICM step! 
#> iter = 26, loglik= 10280792.000000, dloglik=0.000098 
#> predict Y and V! 
#> diff Energy = 10.498935 
#> diff Energy = 0.634590 
#> diff Energy = 0.160987 
#> Finish ICM step! 
#> iter = 27, loglik= 10281822.000000, dloglik=0.000100 
#> predict Y and V! 
#> diff Energy = 7.710985 
#> diff Energy = 0.568661 
#> diff Energy = 1.427724 
#> Finish ICM step! 
#> iter = 28, loglik= 10282816.000000, dloglik=0.000097 
#> predict Y and V! 
#> diff Energy = 6.980169 
#> diff Energy = 0.510189 
#> diff Energy = 0.956927 
#> Finish ICM step! 
#> iter = 29, loglik= 10283801.000000, dloglik=0.000096 
#> predict Y and V! 
#> diff Energy = 6.280589 
#> diff Energy = 0.458026 
#> diff Energy = 3.223541 
#> Finish ICM step! 
#> iter = 30, loglik= 10284767.000000, dloglik=0.000094

Other options

Run for multiple K. Here, we set K=6:9.

## Reset  parameters by increasing cores.
PRECASTObj2 <- AddParSetting(PRECASTObj, Sigma_equal=FALSE, maxIter=30, verbose=TRUE,
                             coreNum =2)
set.seed(2023)
PRECASTObj2 <- PRECAST(PRECASTObj2, K=6:7)

resList2 <- PRECASTObj2@resList
PRECASTObj2 <- SelectModel(PRECASTObj2)

Note: For parallel compuation based on Rcpp on Linux, users require to use the following system command to set the C_stack unlimited in case of R Error: C stack usage is too close to the limit.

ulimit -s unlimited

Besides, user can also use different initialization method by setting int.model, for example, set int.model=NULL; see the functions AddParSetting() and model_set() for more details.

Select a best model and re-organize the results by useing SelectModel(). Even though K is not a vector, it is also necessary to run SelectModel() to re-organize the results in PRECASTObj. The selected best K is 7 by using command str(PRECASTObj@resList).

## check the fitted results: there are four list for the fitted results of each K (6:9).
str(PRECASTObj@resList)
#> List of 1
#>  $ :List of 12
#>   ..$ cluster   :List of 3
#>   .. ..$ : num [1:4226, 1] 4 3 4 2 4 4 4 1 4 3 ...
#>   .. ..$ : num [1:3661, 1] 4 4 3 1 4 3 3 1 4 2 ...
#>   .. ..$ : num [1:3639, 1] 3 4 2 3 1 4 2 3 4 4 ...
#>   .. ..- attr(*, "dim")= int [1:2] 3 1
#>   ..$ hZ        :List of 3
#>   .. ..$ : num [1:4226, 1:15] 0.369 0.403 0.373 0.316 0.338 ...
#>   .. ..$ : num [1:3661, 1:15] 0.334 0.43 0.433 0.456 0.368 ...
#>   .. ..$ : num [1:3639, 1:15] 0.465 0.367 0.307 0.399 0.542 ...
#>   .. ..- attr(*, "dim")= int [1:2] 3 1
#>   ..$ hV        :List of 3
#>   .. ..$ : num [1:4226, 1:15] -3.13 -10.8 -18.53 4.48 4.23 ...
#>   .. ..$ : num [1:3661, 1:15] -13.17 8.07 5.65 2.97 -1.81 ...
#>   .. ..$ : num [1:3639, 1:15] -4.846 1.005 9.527 -0.209 0.529 ...
#>   .. ..- attr(*, "dim")= int [1:2] 3 1
#>   ..$ Rf        :List of 3
#>   .. ..$ : num [1:4226, 1:7] 2.80e-04 8.10e-04 1.60e-05 2.10e-18 4.32e-07 ...
#>   .. ..$ : num [1:3661, 1:7] 5.64e-06 1.50e-01 1.83e-06 9.44e-01 1.66e-07 ...
#>   .. ..$ : num [1:3639, 1:7] 2.98e-04 4.42e-03 1.32e-09 1.54e-04 9.85e-01 ...
#>   .. ..- attr(*, "dim")= int [1:2] 3 1
#>   ..$ beta      : num [1:3, 1] 3 4 2
#>   ..$ Mu        : num [1:7, 1:15] 0.456 0.295 0.455 0.389 -11.064 ...
#>   ..$ Sigma     : num [1:15, 1:15, 1:7] 0.0258 0 0 0 0 ...
#>   ..$ Psi       : num [1:15, 1:15, 1:3] 87.309 30.931 2.082 0.629 1.581 ...
#>   ..$ W         : num [1:1996, 1:15] 0.01266 0.0066 0.01432 0.02175 0.00806 ...
#>   ..$ Lam       : num [1:3, 1:1996] 0.236 1.661 0.2006 0.015 0.0237 ...
#>   ..$ loglik    : num 10284767
#>   ..$ loglik_seq: num [1:30, 1] -2.15e+09 9.72e+06 1.02e+07 1.02e+07 1.02e+07 ...
#>  - attr(*, "para_settings")=List of 8
#>   ..$ K             : num 7
#>   ..$ n             : int 11526
#>   ..$ p             : int 1996
#>   ..$ q             : num 15
#>   ..$ r_max         : int 3
#>   ..$ Sigma_equal   : logi FALSE
#>   ..$ Sigma_diag    : logi TRUE
#>   ..$ mix_prop_heter: logi TRUE
#>  - attr(*, "class")= chr "SeqK_PRECAST_Object"
## backup the fitted results in resList
resList <- PRECASTObj@resList
# PRECASTObj@resList <- resList
PRECASTObj <- SelectModel(PRECASTObj)
## check the best and re-organized results
str(PRECASTObj@resList) ## The selected best K is 7
#> List of 7
#>  $ bestK  : num 7
#>  $ cluster:List of 3
#>   ..$ : num [1:4226, 1] 4 3 4 2 4 4 4 1 4 3 ...
#>   ..$ : num [1:3661, 1] 4 4 3 1 4 3 3 1 4 2 ...
#>   ..$ : num [1:3639, 1] 3 4 2 3 1 4 2 3 4 4 ...
#>   ..- attr(*, "dim")= int [1:2] 3 1
#>  $ hZ     :List of 3
#>   ..$ : num [1:4226, 1:15] 0.369 0.403 0.373 0.316 0.338 ...
#>   ..$ : num [1:3661, 1:15] 0.334 0.43 0.433 0.456 0.368 ...
#>   ..$ : num [1:3639, 1:15] 0.465 0.367 0.307 0.399 0.542 ...
#>   ..- attr(*, "dim")= int [1:2] 3 1
#>  $ Rf     :List of 3
#>   ..$ : num [1:4226, 1:7] 2.80e-04 8.10e-04 1.60e-05 2.10e-18 4.32e-07 ...
#>   ..$ : num [1:3661, 1:7] 5.64e-06 1.50e-01 1.83e-06 9.44e-01 1.66e-07 ...
#>   ..$ : num [1:3639, 1:7] 2.98e-04 4.42e-03 1.32e-09 1.54e-04 9.85e-01 ...
#>   ..- attr(*, "dim")= int [1:2] 3 1
#>  $ hV     :List of 3
#>   ..$ : num [1:4226, 1:15] -3.13 -10.8 -18.53 4.48 4.23 ...
#>   ..$ : num [1:3661, 1:15] -13.17 8.07 5.65 2.97 -1.81 ...
#>   ..$ : num [1:3639, 1:15] -4.846 1.005 9.527 -0.209 0.529 ...
#>   ..- attr(*, "dim")= int [1:2] 3 1
#>  $ hW     : num [1:1996, 1:15] 0.01266 0.0066 0.01432 0.02175 0.00806 ...
#>  $ icMat  : num [1, 1:2] 7 -19800576
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:2] "K" "IC"

Use ARI to check the performance of clustering:

true_cluster <- lapply(PRECASTObj@seulist, function(x) x$true_cluster)
str(true_cluster)
#> List of 3
#>  $ : Named num [1:4226] 1 3 1 7 6 6 7 5 3 3 ...
#>   ..- attr(*, "names")= chr [1:4226] "S1_spot1" "S1_spot2" "S1_spot3" "S1_spot4" ...
#>  $ : Named num [1:3661] 4 6 3 4 6 3 3 4 6 7 ...
#>   ..- attr(*, "names")= chr [1:3661] "S1_spot1" "S1_spot2" "S1_spot3" "S1_spot4" ...
#>  $ : Named num [1:3639] 3 1 7 3 5 6 7 3 2 6 ...
#>   ..- attr(*, "names")= chr [1:3639] "S1_spot1" "S1_spot2" "S1_spot3" "S1_spot4" ...
mclust::adjustedRandIndex(unlist(PRECASTObj@resList$cluster), unlist(true_cluster))
#> [1] 0.5254209

We provide two methods to correct the batch effects in gene expression level. Method (1) is using only PRECAST results to obtain the batch corrected gene expressions if the species of data is unknown or the number of overlapped housekeeping genes between the variable genes in PRECASTObj@seulist and the genes in database is less than five. Method (2) is using bouth housekeeping gene and PRECAST results to obtain the batch corrected gene expressions.

Note: to obtain batch corrected gene expressions based on housekeeping genes as the negative control, users must specify the species of data source and use gene symbol names in PRECASTObj@seulist.

Integrate the two samples by the function IntegrateSpaData. Because this is a simulated data, we use Method (1) by setting species='unknown'.


seuInt <- IntegrateSpaData(PRECASTObj, species='unknown')
#> Using only PRECAST results to obtain the batch corrected gene expressions since species is unknown or the genelist in PRECASTObj has less than 5 overlapp with the housekeeping genes of given species.
#> Users can specify the custom_housekeep by themselves to use the housekeeping genes based methods.
seuInt 
#> An object of class Seurat 
#> 1996 features across 11526 samples within 1 assay 
#> Active assay: PRE_CAST (1996 features, 0 variable features)
#>  2 dimensional reductions calculated: PRECAST, position
## The low-dimensional embeddings obtained by PRECAST are saved in PRECAST reduction slot.

Visualization

First, user can choose a beautiful color schema using chooseColors().

cols_cluster <- chooseColors(palettes_name = 'Nature 10', n_colors = 7, plot_colors = TRUE)

Show the spatial scatter plot for clusters

p12 <- SpaPlot(seuInt, batch=NULL, cols=cols_cluster, point_size=2, combine=TRUE)
p12

# users can plot each sample by setting combine=FALSE

Users can re-plot the above figures for specific need by returning a ggplot list object. For example, we only plot the spatial heatmap of first two data batches.

pList <- SpaPlot(seuInt, batch=NULL, cols=cols_cluster, point_size=2, combine=FALSE, title_name=NULL)
drawFigs(pList[1:2], layout.dim = c(1,2), common.legend = TRUE, legend.position = 'right', align='hv')

Show the spatial UMAP/tNSE RGB plot

seuInt <- AddUMAP(seuInt) 
SpaPlot(seuInt, batch=NULL,item='RGB_UMAP',point_size=1, combine=TRUE, text_size=15)


## Plot tSNE RGB plot
#seuInt <- AddTSNE(seuInt) 
#SpaPlot(seuInt, batch=NULL,item='RGB_TSNE',point_size=2, combine=T, text_size=15)

Show the tSNE plot based on the extracted features from PRECAST to check the performance of integration.

seuInt <- AddTSNE(seuInt, n_comp = 2) 

p1 <- dimPlot(seuInt, item='cluster', font_family='serif', cols=cols_cluster) # Times New Roman
p2 <- dimPlot(seuInt, item='batch', point_size = 1,  font_family='serif')
drawFigs(list(p1, p2), common.legend=FALSE, align='hv')

# It is noted that only sample batch 1 has cluster 4, and only sample batch 2 has cluster 7.

Show the UMAP plot based on the extracted features from PRECAST.

dimPlot(seuInt, reduction = 'UMAP3', item='cluster', cols=cols_cluster, font_family='serif')

Users can also use the visualization functions in Seurat package:

library(Seurat)
p1 <- DimPlot(seuInt[,1: 4226], reduction = 'position', cols=cols_cluster, pt.size =1) # plot the first data batch: first 4226 spots.
p2 <- DimPlot(seuInt, reduction = 'tSNE',cols=cols_cluster, pt.size=1)
drawFigs(list(p1, p2), layout.dim = c(1,2), common.legend = TRUE)

Combined differential expression analysis

dat_deg <- FindAllMarkers(seuInt)
#> Calculating cluster 1
#> Calculating cluster 2
#> Calculating cluster 3
#> Calculating cluster 4
#> Calculating cluster 5
#> Calculating cluster 6
#> Calculating cluster 7
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.1.3
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
n <- 2
dat_deg %>%
  group_by(cluster) %>%
  top_n(n = n, wt = avg_log2FC) -> top10

head(top10)
#> # A tibble: 6 x 7
#> # Groups:   cluster [3]
#>       p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene    
#>       <dbl>      <dbl> <dbl> <dbl>     <dbl> <fct>   <chr>   
#> 1 5.29e-  6       1.35 0.799 0.743 1.06e-  2 1       gene756 
#> 2 3.99e-  5       1.58 0.7   0.632 7.96e-  2 1       gene670 
#> 3 9.57e-166       2.71 0.973 0.811 1.91e-162 2       gene509 
#> 4 9.31e-140       2.76 0.968 0.872 1.86e-136 2       gene1565
#> 5 0               2.58 0.904 0.685 0         3       gene1097
#> 6 0               2.56 0.915 0.715 0         3       gene1797

Session Info

sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22621)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Chinese (Simplified)_China.936 
#> [2] LC_CTYPE=Chinese (Simplified)_China.936   
#> [3] LC_MONETARY=Chinese (Simplified)_China.936
#> [4] LC_NUMERIC=C                              
#> [5] LC_TIME=Chinese (Simplified)_China.936    
#> 
#> attached base packages:
#> [1] parallel  stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] dplyr_1.0.9        sp_1.5-0           SeuratObject_4.1.0 Seurat_4.1.1      
#> [5] PRECAST_1.6        gtools_3.9.2.2    
#> 
#> loaded via a namespace (and not attached):
#>   [1] utf8_1.2.3                  reticulate_1.25            
#>   [3] tidyselect_1.1.2            htmlwidgets_1.5.4          
#>   [5] grid_4.1.2                  BiocParallel_1.28.3        
#>   [7] Rtsne_0.16                  munsell_0.5.0              
#>   [9] ScaledMatrix_1.2.0          codetools_0.2-18           
#>  [11] ragg_1.2.2                  ica_1.0-2                  
#>  [13] future_1.26.1               miniUI_0.1.1.1             
#>  [15] withr_2.5.0                 spatstat.random_2.2-0      
#>  [17] colorspace_2.1-0            progressr_0.10.1           
#>  [19] Biobase_2.54.0              highr_0.9                  
#>  [21] knitr_1.37                  rstudioapi_0.13            
#>  [23] stats4_4.1.2                SingleCellExperiment_1.16.0
#>  [25] ROCR_1.0-11                 ggsignif_0.6.3             
#>  [27] tensor_1.5                  listenv_0.8.0              
#>  [29] labeling_0.4.2              MatrixGenerics_1.6.0       
#>  [31] GenomeInfoDbData_1.2.7      polyclip_1.10-0            
#>  [33] farver_2.1.1                rprojroot_2.0.3            
#>  [35] parallelly_1.32.0           vctrs_0.6.1                
#>  [37] generics_0.1.2              xfun_0.29                  
#>  [39] ggthemes_4.2.4              R6_2.5.1                   
#>  [41] GenomeInfoDb_1.30.1         ggbeeswarm_0.6.0           
#>  [43] rsvd_1.0.5                  bitops_1.0-7               
#>  [45] spatstat.utils_3.0-1        cachem_1.0.6               
#>  [47] DelayedArray_0.20.0         assertthat_0.2.1           
#>  [49] promises_1.2.0.1            scales_1.2.1               
#>  [51] rgeos_0.5-9                 beeswarm_0.4.0             
#>  [53] gtable_0.3.3                beachmat_2.10.0            
#>  [55] globals_0.15.0              goftest_1.2-3              
#>  [57] rlang_1.1.0                 systemfonts_1.0.4          
#>  [59] splines_4.1.2               rstatix_0.7.0              
#>  [61] lazyeval_0.2.2              broom_0.7.12               
#>  [63] spatstat.geom_2.4-0         yaml_2.3.6                 
#>  [65] reshape2_1.4.4              abind_1.4-5                
#>  [67] backports_1.4.1             httpuv_1.6.5               
#>  [69] tools_4.1.2                 ggplot2_3.4.1              
#>  [71] ellipsis_0.3.2              spatstat.core_2.4-4        
#>  [73] jquerylib_0.1.4             RColorBrewer_1.1-3         
#>  [75] BiocGenerics_0.40.0         ggridges_0.5.3             
#>  [77] Rcpp_1.0.10                 plyr_1.8.7                 
#>  [79] sparseMatrixStats_1.6.0     zlibbioc_1.40.0            
#>  [81] purrr_0.3.4                 RCurl_1.98-1.6             
#>  [83] ggpubr_0.4.0                rpart_4.1.16               
#>  [85] deldir_1.0-6                pbapply_1.5-0              
#>  [87] viridis_0.6.2               cowplot_1.1.1              
#>  [89] S4Vectors_0.32.3            zoo_1.8-10                 
#>  [91] SummarizedExperiment_1.24.0 ggrepel_0.9.1              
#>  [93] cluster_2.1.2               fs_1.5.2                   
#>  [95] magrittr_2.0.3              GiRaF_1.0.1                
#>  [97] RSpectra_0.16-1             data.table_1.14.2          
#>  [99] scattermore_0.8             lmtest_0.9-40              
#> [101] RANN_2.6.1                  fitdistrplus_1.1-8         
#> [103] matrixStats_0.62.0          patchwork_1.1.1            
#> [105] mime_0.12                   evaluate_0.15              
#> [107] xtable_1.8-4                mclust_5.4.10              
#> [109] IRanges_2.28.0              gridExtra_2.3              
#> [111] compiler_4.1.2              scater_1.25.1              
#> [113] tibble_3.2.1                KernSmooth_2.23-20         
#> [115] crayon_1.5.1                htmltools_0.5.2            
#> [117] mgcv_1.8-39                 later_1.3.0                
#> [119] tidyr_1.2.0                 DBI_1.1.2                  
#> [121] MASS_7.3-55                 car_3.0-12                 
#> [123] Matrix_1.4-0                cli_3.2.0                  
#> [125] igraph_1.3.5                DR.SC_3.2                  
#> [127] GenomicRanges_1.46.1        pkgconfig_2.0.3            
#> [129] pkgdown_2.0.6               plotly_4.10.0              
#> [131] scuttle_1.4.0               spatstat.sparse_2.1-1      
#> [133] vipor_0.4.5                 bslib_0.3.1                
#> [135] XVector_0.34.0              CompQuadForm_1.4.3         
#> [137] stringr_1.4.0               digest_0.6.29              
#> [139] sctransform_0.3.3           RcppAnnoy_0.0.19           
#> [141] spatstat.data_3.0-0         rmarkdown_2.11             
#> [143] leiden_0.4.2                uwot_0.1.11                
#> [145] DelayedMatrixStats_1.16.0   shiny_1.7.1                
#> [147] lifecycle_1.0.3             nlme_3.1-155               
#> [149] jsonlite_1.8.0              carData_3.0-5              
#> [151] BiocNeighbors_1.12.0        limma_3.50.1               
#> [153] desc_1.4.0                  viridisLite_0.4.1          
#> [155] fansi_1.0.4                 pillar_1.9.0               
#> [157] lattice_0.20-45             fastmap_1.1.0              
#> [159] httr_1.4.3                  survival_3.2-13            
#> [161] glue_1.6.2                  png_0.1-7                  
#> [163] stringi_1.7.6               sass_0.4.1                 
#> [165] textshaping_0.3.6           BiocSingular_1.10.0        
#> [167] memoise_2.0.1               irlba_2.3.5                
#> [169] future.apply_1.9.0

Wei Liu