Joint dimension reduction and spatial clustering for scRNA-seq and spatial transcriptomics data

DR.SC_fit(X, K, Adj_sp=NULL, q=15,
             error.heter= TRUE, beta_grid=seq(0.5, 5, by=0.5),
             maxIter=25, epsLogLik=1e-5, verbose=FALSE, maxIter_ICM=6,
             wpca.int=FALSE, int.model="EEE", approxPCA=FALSE, coreNum = 5)

Arguments

X

a sparse matrix with class dgCMatrix or matrix, specify the log-normalization gene expression matrix used for DR-SC model.

K

a positive integer allowing scalar or vector, specify the number of clusters in model fitting.

Adj_sp

an optional sparse matrix with class dgCMatrix, specify the adjoint matrix used for DR-SC model. We provide this interface for those users who would like to define the adjacency matrix by their own.

q

a positive integer, specify the number of latent features to be extracted, default as 15. Usually, the choice of q is a trade-off between model complexity and fit to the data, and depends on the goals of the analysis and the structure of the data. A higher value will result in a more complex model with a higher number of parameters, which may lead to overfitting and poor generalization performance. On the other hand, a lower value will result in a simpler model with fewer parameters, but may also lead to underfitting and a poorer fit to the data.

error.heter

an optional logical value, whether use the heterogenous error for DR-SC model, default as TRUE. If error.heter=FALSE, then the homogenuous error is used for probabilistic PCA model in DR-SC.

beta_grid

an optional vector of positive value, the candidate set of the smoothing parameter to be searched by the grid-search optimization approach.

maxIter

an optional positive value, represents the maximum iterations of EM.

epsLogLik

an optional positive vlaue, tolerance vlaue of relative variation rate of the observed pseudo log-loglikelihood value, defualt as '1e-5'.

verbose

an optional logical value, whether output the information of the ICM-EM algorithm.

maxIter_ICM

an optional positive value, represents the maximum iterations of ICM.

wpca.int

an optional logical value, means whether use the weighted PCA to obtain the initial values of loadings and other paramters, default as FALSE which means the ordinary PCA is used.

int.model

an optional string, specify which Gaussian mixture model is used in evaluting the initial values for DR-SC, default as "EEE"; and see Mclust for more models' names.

approxPCA

an optional logical value, whether use approximated PCA to speed up the computation for initial values.

coreNum

an optional positive integer, means the number of thread used in parallel computating, default as 5. If the length of K is one, then coreNum will be set as 1 automatically.

Details

Nothing

Value

DR.SC_fit returns a list with class "drscObject" with the following three components:

Objdrsc

a list including the model fitting results, in which the number of elements is same as the length of K.

out_param

a numeric matrix used for model selection in MBIC.

K_set

a scalar or vector equal to input argument K.

In addition, each element of "Objdrsc" is a list with the following comoponents:

cluster

inferred class labels

hZ

extracted latent features.

beta

estimated smoothing parameter

Mu

mean vectors of mixtures components.

Sigma

covariance matrix of mixtures components.

W

estimated loading matrix

Lam_vec

estimated variance of errors in probabilistic PCA model

loglik

pseudo observed log-likelihood.

Author

Wei Liu

Note

nothing

See also

None

Examples

## we generate the spatial transcriptomics data with lattice neighborhood, i.e. ST platform.
seu <- gendata_RNAExp(height=10, width=10,p=50, K=4)
library(Seurat)
#> Warning: package ‘Seurat’ was built under R version 4.1.3
seu <- NormalizeData(seu, verbose=FALSE)
# choose 40 highly variable features using FindVariableFeatures in Seurat
# seu <- FindVariableFeatures(seu, nfeatures = 40)
# or choose 40 spatailly variable features using FindSVGs in DR.SC
seu <- FindSVGs(seu, nfeatures = 40, verbose=FALSE)
# users define the adjacency matrix
Adj_sp <- getAdj(seu, platform = 'ST')
#> Neighbors were identified for 100 out of 100 spots.
var.features <- seu@assays$RNA@var.features
X <- Matrix::t(seu[["RNA"]]@data[var.features,])
# maxIter = 2 is only used for illustration, and user can use default.
drscList <- DR.SC_fit(X,Adj_sp=Adj_sp, K=4, maxIter=2, verbose=TRUE)
#> Fit DR-SC model...
#> -------------------Calculate inital values-------------
#> Using accurate PCA to obtain initial values
#> -------------------Finish computing inital values------------- 
#> -------------------Starting  ICM-EM algortihm-------------
#> iter = 2, loglik= -1293.974640, dloglik=0.999999 
#> -------------------Complete!-------------
#> elasped time is :0.06
#> Finish DR-SC model fitting