ICM-EM algorithm for fitting PRECAST model

ICM.EM(XList, q, K, AdjList=NULL,  Adjlist_car=NULL, posList = NULL, 
      platform = "ST", beta_grid=seq(0.2,4, by=0.2),maxIter_ICM=6,
      maxIter=20, epsLogLik=1e-5, verbose=TRUE,mix_prop_heter=TRUE, 
      Sigma_equal=FALSE, Sigma_diag=TRUE,error_heter=TRUE, Sp2=TRUE,
      wpca_int=FALSE, int.model='EEE', seed=1,coreNum = 1, coreNum_int=coreNum)

Arguments

XList

an M-length list consisting of multiple matrices with class dgCMatrix or matrix that specify the log-normalization gene expression matrix for each data sample used for PRECAST model.

q

a positive integer, specify the number of latent features to be extracted, default as 15.

K

a positive integer allowing scalar or vector, specify the number of clusters in model fitting.

AdjList

an M-length list of sparse matrices with class dgCMatrix, specify the adjacency matrix used for Potts model in PRECAST. We provide this interface for those users who would like to define the adjacency matrix by their own.

Adjlist_car

an M-length list of sparse matrices with class dgCMatrix, specify the adjacency matrix used for CAR model in PRECAST, default as AdjList in the Potts model. We provide this interface for those users who would like to use the different adjacency matrix in CAR model.

posList

an M-length list composed by spatial coordinate matrix for each data sample.

platform

a string, specify the platform of the provided data, default as "Visium". There are many platforms to be supported, including ("Visuim", "ST", "SeqFISH", 'merFISH', 'slide-seqv2', 'seqscope', "HDST"). If AdjList is not given, the The platform helps to calculate the adjacency matrix by defining the neighbors.

beta_grid

an optional vector of positive value, the candidate set of the smoothing parameter to be searched by the grid-search optimization approach.

maxIter_ICM

an optional positive value, represents the maximum iterations of ICM.

maxIter

an optional positive value, represents the maximum iterations of EM.

epsLogLik

an optional positive vlaue, tolerance vlaue of relative variation rate of the observed pseudo log-loglikelihood value, defualt as '1e-5'.

verbose

an optional logical value, whether output the information of the ICM-EM algorithm.

mix_prop_heter

an optional logical value, specify whether betar are distict, default as TRUE.

Sigma_equal

an optional logical value, specify whether Sigmaks are equal, default as FALSE.

Sigma_diag

an optional logical value, specify whether Sigmaks are diagonal matrices, default as TRUE.

error_heter

an optional logical value, whether use the heterogenous error for DR-SC model, default as TRUE. If error_heter=FALSE, then the homogenuous error is used for probabilistic PCA model in PRECAST.

Sp2

an optional logical value, whether add the ICAR model component in the model, default as TRUE. We provide this interface for those users who don't want to include the ICAR model.

wpca_int

an optional logical value, means whether use the weighted PCA to obtain the initial values of loadings and other paramters, default as FALSE which means the ordinary PCA is used.

int.model

an optional string, specify which Gaussian mixture model is used in evaluting the initial values for PRECAST, default as "EEE"; and see Mclust for more models' names.

seed

an optional integer, the random seed in fitting PRECAST model.

coreNum

an optional positive integer, means the number of thread used in parallel computating.

coreNum_int

an optional positive integer, means the number of cores used in parallel computation for initial values when K is a vector, default as same as coreNum.

Details

Nothing

Value

ICM.EM returns a list with class "SeqKiDRSC_Object" with the number of components equal to the length of K, where each component includes the model fitting results for one number of cluster and is a list consisting of following components:

cluster

an M-length list that includes the inferred class labels for each data sample.

hZ

an M-length list that includes the batch corrected low-dimensional embeddings for each data sample.

hV

an M-length list that includes the estimate the ICAR component for each sample.

Rf

an M-length list that includes the posterior probability of domain clusters for each sample.

beta

an M-length vector that includes the estimated smoothing parameters for each sample.

Mu

mean vectors of mixtures components.

Sigma

covariance matrix of mixtures components.

W

estimated loading matrix

Lam

estimated variance of errors in probabilistic PCA model

loglik

pseudo observed log-likelihood.

Author

Wei Liu

Note

nothing

See also

None

Examples

# \donttest{
  ## we generate the spatial transcriptomics data with lattice neighborhood, i.e. ST platform.
  library(Matrix)
  q <- 10; K <- 4
  data(PRECASTObj)
  posList <- lapply(PRECASTObj@seulist, function(x) cbind(x$row, x$col))
  AdjList <- lapply(posList, getAdj_reg, platform='ST')
#> Neighbors were identified for 900 out of 900 spots.
#> Neighbors were identified for 900 out of 900 spots.
  XList <- lapply(PRECASTObj@seulist, function(x) t(x[['RNA']]@data))
  XList <- lapply(XList, scale, scale=FALSE)
  ## For illustration, maxIter is set to 4
  resList <- ICM.EM(XList,AdjList = AdjList, maxIter=4,
                   q=q, K=K, verbose=TRUE)
#> Intergrative data info.: 2 samples, 97 genes X 1800 spots------
#> PRECAST model setting: error_heter=TRUE, Sigma_equal=FALSE, Sigma_diag=TRUE, mix_prop_heter=TRUE
#> Start computing intial values... 
#> fitting ...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%
#> ----Fitting PRECAST model----------------
#> variable initialize finish! 
#> predict Y and V! 
#> diff Energy = 44.205532 
#> Finish ICM step! 
#> iter = 2, loglik= -59510.035156, dloglik=0.999972 
#> predict Y and V! 
#> diff Energy = 12.547775 
#> diff Energy = 3.939711 
#> Finish ICM step! 
#> iter = 3, loglik= -53027.710938, dloglik=0.108928 
#> predict Y and V! 
#> diff Energy = 7.640621 
#> diff Energy = 1.688112 
#> Finish ICM step! 
#> iter = 4, loglik= -50711.550781, dloglik=0.043678 
# }