ICM-EM algorithm implementation

ICM-EM algorithm for fitting PRECAST model

ICM.EM(XList, q, K, AdjList=NULL,  Adjlist_car=NULL, posList = NULL, 
      platform = "ST", beta_grid=seq(0.2,4, by=0.2),maxIter_ICM=6,
      maxIter=20, epsLogLik=1e-5, verbose=TRUE,mix_prop_heter=TRUE, 
      Sigma_equal=FALSE, Sigma_diag=TRUE,error_heter=TRUE, Sp2=TRUE,
      wpca_int=FALSE, int.model='EEE', seed=1,coreNum = 1, coreNum_int=coreNum)

Arguments

XList: an M-length list consisting of multiple matrices with class dgCMatrix or matrix that specify the log-normalization gene expression matrix for each data sample used for PRECAST model.
q: a positive integer, specify the number of latent features to be extracted, default as 15.
K: a positive integer allowing scalar or vector, specify the number of clusters in model fitting.
AdjList: an M-length list of sparse matrices with class dgCMatrix, specify the adjacency matrix used for Potts model in PRECAST. We provide this interface for those users who would like to define the adjacency matrix by their own.
Adjlist_car: an M-length list of sparse matrices with class dgCMatrix, specify the adjacency matrix used for CAR model in PRECAST, default as AdjList in the Potts model. We provide this interface for those users who would like to use the different adjacency matrix in CAR model.
posList: an M-length list composed by spatial coordinate matrix for each data sample.
platform: a string, specify the platform of the provided data, default as "Visium". There are many platforms to be supported, including ("Visuim", "ST", "SeqFISH", 'merFISH', 'slide-seqv2', 'seqscope', "HDST"). If AdjList is not given, the The platform helps to calculate the adjacency matrix by defining the neighbors.
beta_grid: an optional vector of positive value, the candidate set of the smoothing parameter to be searched by the grid-search optimization approach.
maxIter_ICM: an optional positive value, represents the maximum iterations of ICM.
maxIter: an optional positive value, represents the maximum iterations of EM.
epsLogLik: an optional positive vlaue, tolerance vlaue of relative variation rate of the observed pseudo log-loglikelihood value, defualt as '1e-5'.
verbose: an optional logical value, whether output the information of the ICM-EM algorithm.
mix_prop_heter: an optional logical value, specify whether betar are distict, default as TRUE.
Sigma_equal: an optional logical value, specify whether Sigmaks are equal, default as FALSE.
Sigma_diag: an optional logical value, specify whether Sigmaks are diagonal matrices, default as TRUE.
error_heter: an optional logical value, whether use the heterogenous error for DR-SC model, default as TRUE. If error_heter=FALSE, then the homogenuous error is used for probabilistic PCA model in PRECAST.
Sp2: an optional logical value, whether add the ICAR model component in the model, default as TRUE. We provide this interface for those users who don't want to include the ICAR model.
wpca_int: an optional logical value, means whether use the weighted PCA to obtain the initial values of loadings and other paramters, default as FALSE which means the ordinary PCA is used.
int.model: an optional string, specify which Gaussian mixture model is used in evaluting the initial values for PRECAST, default as "EEE"; and see Mclust for more models' names.
seed: an optional integer, the random seed in fitting PRECAST model.
coreNum: an optional positive integer, means the number of thread used in parallel computating.
coreNum_int: an optional positive integer, means the number of cores used in parallel computation for initial values when K is a vector, default as same as coreNum.

Details

Nothing

Value

ICM.EM returns a list with class "SeqKiDRSC_Object" with the number of components equal to the length of K, where each component includes the model fitting results for one number of cluster and is a list consisting of following components:

cluster: an M-length list that includes the inferred class labels for each data sample.
hZ: an M-length list that includes the batch corrected low-dimensional embeddings for each data sample.
hV: an M-length list that includes the estimate the ICAR component for each sample.
Rf: an M-length list that includes the posterior probability of domain clusters for each sample.
beta: an M-length vector that includes the estimated smoothing parameters for each sample.
Mu: mean vectors of mixtures components.
Sigma: covariance matrix of mixtures components.
W: estimated loading matrix
Lam: estimated variance of errors in probabilistic PCA model
loglik: pseudo observed log-likelihood.

References

Wei Liu, Liao, X., Luo, Z. et al, Jin Liu* (2023). Probabilistic embedding, clustering, and alignment for integrating spatial transcriptomics data with PRECAST. Nature Communications, 14, 296

Author

Wei Liu

Note

nothing

Examples

# \donttest{
  ## we generate the spatial transcriptomics data with lattice neighborhood, i.e. ST platform.
  library(Matrix)
  q <- 10; K <- 4
  data(PRECASTObj)
  posList <- lapply(PRECASTObj@seulist, function(x) cbind(x$row, x$col))
  AdjList <- lapply(posList, getAdj_reg, platform='ST')
#> Neighbors were identified for 900 out of 900 spots.
#> Neighbors were identified for 900 out of 900 spots.
  XList <- lapply(PRECASTObj@seulist, function(x) t(x[['RNA']]@data))
  XList <- lapply(XList, scale, scale=FALSE)
  ## For illustration, maxIter is set to 4
  resList <- ICM.EM(XList,AdjList = AdjList, maxIter=4,
                   q=q, K=K, verbose=TRUE)
#> Intergrative data info.: 2 samples, 97 genes X 1800 spots------
#> PRECAST model setting: error_heter=TRUE, Sigma_equal=FALSE, Sigma_diag=TRUE, mix_prop_heter=TRUE
#> Start computing intial values... 
#> fitting ...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%
#> ----Fitting PRECAST model----------------
#> variable initialize finish! 
#> predict Y and V! 
#> diff Energy = 44.205532 
#> Finish ICM step! 
#> iter = 2, loglik= -59510.035156, dloglik=0.999972 
#> predict Y and V! 
#> diff Energy = 12.547775 
#> diff Energy = 3.939711 
#> Finish ICM step! 
#> iter = 3, loglik= -53027.710938, dloglik=0.108928 
#> predict Y and V! 
#> diff Energy = 7.640621 
#> diff Energy = 1.688112 
#> Finish ICM step! 
#> iter = 4, loglik= -50711.550781, dloglik=0.043678 
# }