ICM-EM algorithm implementation with organized paramters

Efficient data integration as well as spatial clustering for multiple spatial transcriptomics data

ICM.EM_structure(XList,  K, AdjList, q=15,parameterList=NULL)

Arguments

XList: an M-length list consisting of multiple matrices with class dgCMatrix or matrix that specify the log-normalization gene expression matrix for each data sample used for PRECAST model.
K: a positive integer allowing scalar or vector, specify the number of clusters in model fitting.
AdjList: an M-length list of sparse matrices with class dgCMatrix, specify the adjacency matrix used for Potts model and Intrisic CAR model in PRECAST model. We provide this interface for those users who would like to define the adjacency matrix by their own.
q: a positive integer, specify the number of latent features to be extracted, default as 15.
parameterList: Other arguments in PRECAST model, it can be set by model_set.

Details

Nothing

Value

ICM.EM_structure returns a list with class "SeqK_PRECAST_Object" with the number of components equal to the length of K, where each component includes the model fitting results for one number of cluster and is a list consisting of following components:

cluster: an M-length list that includes the inferred class labels for each data sample.
hZ: an M-length list that includes the batch corrected low-dimensional embeddings for each data sample.
hV: an M-length list that includes the estimate the ICAR component for each sample.
Rf: an M-length list that includes the posterior probability of domain clusters for each sample.
beta: an M-length vector that includes the estimated smoothing parameters for each sample.
Mu: mean vectors of mixtures components.
Sigma: covariance matrix of mixtures components.
W: estimated loading matrix
Lam: estimated variance of errors in probabilistic PCA model
loglik: pseudo observed log-likelihood.

References

Wei Liu, Liao, X., Luo, Z. et al, Jin Liu* (2023). Probabilistic embedding, clustering, and alignment for integrating spatial transcriptomics data with PRECAST. Nature Communications, 14, 296

Author

Wei Liu

Note

nothing

Examples

# \donttest{
  ## we generate the spatial transcriptomics data with lattice neighborhood, i.e. ST platform.
  library(Matrix)
  q <- 10; K <- 4
  data(PRECASTObj)
  posList <- lapply(PRECASTObj@seulist, function(x) cbind(x$row, x$col))
  AdjList <- lapply(posList, getAdj_reg, platform='ST')
#> Neighbors were identified for 900 out of 900 spots.
#> Neighbors were identified for 900 out of 900 spots.
  XList <- lapply(PRECASTObj@seulist, function(x) t(x[['RNA']]@data))
  XList <- lapply(XList, scale, scale=FALSE)
  parList <- model_set(maxIter=4)
  resList <- ICM.EM_structure(XList,  AdjList = AdjList, 
                   q=q, K=K, parameterList=parList)
#> Intergrative data info.: 2 samples, 97 genes X 1800 spots------
#> PRECAST model setting: error_heter=TRUE, Sigma_equal=FALSE, Sigma_diag=TRUE, mix_prop_heter=TRUE
#> Start computing intial values... 
#> fitting ...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%
#> ----Fitting PRECAST model----------------
#> variable initialize finish! 
#> predict Y and V! 
#> diff Energy = 44.205532 
#> Finish ICM step! 
#> iter = 2, loglik= -59510.035156, dloglik=0.999972 
#> predict Y and V! 
#> diff Energy = 12.547775 
#> diff Energy = 3.939711 
#> Finish ICM step! 
#> iter = 3, loglik= -53027.710938, dloglik=0.108928 
#> predict Y and V! 
#> diff Energy = 7.640621 
#> diff Energy = 1.688112 
#> Finish ICM step! 
#> iter = 4, loglik= -50711.550781, dloglik=0.043678 
# }