DR.SC.fit.Rd
Joint dimension reduction and spatial clustering for scRNA-seq and spatial transcriptomics data
DR.SC_fit(X, K, Adj_sp=NULL, q=15,
error.heter= TRUE, beta_grid=seq(0.5, 5, by=0.5),
maxIter=25, epsLogLik=1e-5, verbose=FALSE, maxIter_ICM=6,
wpca.int=FALSE, int.model="EEE", approxPCA=FALSE, coreNum = 5)
a sparse matrix with class dgCMatrix
or matrix
, specify the log-normalization gene expression matrix used for DR-SC model.
a positive integer allowing scalar or vector, specify the number of clusters in model fitting.
an optional sparse matrix with class dgCMatrix
, specify the adjoint matrix used for DR-SC model. We provide this interface for those users who would like to define the adjacency matrix by their own.
a positive integer, specify the number of latent features to be extracted, default as 15. Usually, the choice of q is a trade-off between model complexity and fit to the data, and depends on the goals of the analysis and the structure of the data. A higher value will result in a more complex model with a higher number of parameters, which may lead to overfitting and poor generalization performance. On the other hand, a lower value will result in a simpler model with fewer parameters, but may also lead to underfitting and a poorer fit to the data.
an optional logical value, whether use the heterogenous error for DR-SC model, default as TRUE
. If error.heter=FALSE
, then the homogenuous error is used for probabilistic PCA model in DR-SC.
an optional vector of positive value, the candidate set of the smoothing parameter to be searched by the grid-search optimization approach.
an optional positive value, represents the maximum iterations of EM.
an optional positive vlaue, tolerance vlaue of relative variation rate of the observed pseudo log-loglikelihood value, defualt as '1e-5'.
an optional logical value, whether output the information of the ICM-EM algorithm.
an optional positive value, represents the maximum iterations of ICM.
an optional logical value, means whether use the weighted PCA to obtain the initial values of loadings and other paramters, default as FALSE
which means the ordinary PCA is used.
an optional string, specify which Gaussian mixture model is used in evaluting the initial values for DR-SC, default as "EEE"; and see Mclust for more models' names.
an optional logical value, whether use approximated PCA to speed up the computation for initial values.
an optional positive integer, means the number of thread used in parallel computating, default as 5. If the length of K is one, then coreNum will be set as 1 automatically.
Nothing
DR.SC_fit returns a list with class "drscObject" with the following three components:
a list including the model fitting results, in which the number of elements is same as the length of K.
a numeric matrix used for model selection in MBIC.
a scalar or vector equal to input argument K.
In addition, each element of "Objdrsc" is a list with the following comoponents:
inferred class labels
extracted latent features.
estimated smoothing parameter
mean vectors of mixtures components.
covariance matrix of mixtures components.
estimated loading matrix
estimated variance of errors in probabilistic PCA model
pseudo observed log-likelihood.
nothing
None
## we generate the spatial transcriptomics data with lattice neighborhood, i.e. ST platform.
seu <- gendata_RNAExp(height=10, width=10,p=50, K=4)
library(Seurat)
#> Warning: package ‘Seurat’ was built under R version 4.1.3
seu <- NormalizeData(seu, verbose=FALSE)
# choose 40 highly variable features using FindVariableFeatures in Seurat
# seu <- FindVariableFeatures(seu, nfeatures = 40)
# or choose 40 spatailly variable features using FindSVGs in DR.SC
seu <- FindSVGs(seu, nfeatures = 40, verbose=FALSE)
# users define the adjacency matrix
Adj_sp <- getAdj(seu, platform = 'ST')
#> Neighbors were identified for 100 out of 100 spots.
var.features <- seu@assays$RNA@var.features
X <- Matrix::t(seu[["RNA"]]@data[var.features,])
# maxIter = 2 is only used for illustration, and user can use default.
drscList <- DR.SC_fit(X,Adj_sp=Adj_sp, K=4, maxIter=2, verbose=TRUE)
#> Fit DR-SC model...
#> -------------------Calculate inital values-------------
#> Using accurate PCA to obtain initial values
#> -------------------Finish computing inital values-------------
#> -------------------Starting ICM-EM algortihm-------------
#> iter = 2, loglik= -1293.974640, dloglik=0.999999
#> -------------------Complete!-------------
#> elasped time is :0.06
#> Finish DR-SC model fitting