Generate simulated data from MMGFM models

gendata_mmgfm(
  seed = 1,
  nvec = c(300, 200),
  pveclist = list(gaussian = c(50, 150), poisson = c(50), binomial = c(100, 60)),
  q = 6,
  d = 3,
  qs = rep(2, length(nvec)),
  rho = rep(1, length(pveclist)),
  rho_z = 1,
  sigmavec = rep(0.5, length(pveclist)),
  n_bin = 1,
  sigma_eps = 1,
  heter_error = FALSE
)

Arguments

seed

a postive integer, the random seed for reproducibility of data generation process.

nvec

a vector with postive integers, specify the sample size in each study/source.

pveclist

a named list, specify the number of modalities for each type and variable dimension in each type of modatlity.

q

a postive integer, specify the number of study-shared factors.

d

a postive integer, specify the dimension of covariate matrix.

qs

a vector with postive integers, specify the number of study-specified factors.

rho

a numeric vector with length(pveclist) and positive elements, specify the signal strength of loading matrices for each modality type.

rho_z

a positive real, specify the signal strength of covariates.

sigmavec

a positive real vector with length(pveclist), specify the variance of study-specified and modality variable-shared factors; default as 0.5 for each element.

n_bin

a positive integer, specify the number of trails when generate Binomial modality matrix; default as 1.

sigma_eps

a positive real, the variance of overdispersion error; default as 1.

heter_error

a logical value, whether to generate the heterogeneous error; default as FALSE.

Value

return a list including the following components:

  • XList - a S-length list with each component a m-length list composed by a combined modality matrix of the same type modalities, which is the observed matrix from each source/study and each modality, where m is the number of modality types.

  • ZList - a S-length list with each component a matrix that is the covariate matrix from each study.

  • tauList - a S-length list with each component a m-length list correponding the offset term for each combined modality of each study.

  • A0List - a M-length list composed by the loading matrix corresponding to study-shared factors for each modality;

  • B0List - a M-length list composed by the loading matrix list corresponding to study-specified factors for each modality;

  • VList - a S-length list composed by a M-length vector list corresponding to the study-specified and modality variable-shared factor for each study and modality;

  • F0List - a S-length list composed by the study-shared factor matrix for each study;

  • H0List - a S-length list composed by the study-specified factor matrix for each study;

  • betaList - a M-length list composed by the true regression coefficient matrix for each modality;

  • sigma_eps - a positive scalar, the variance of error;

  • numvarmat - a m-by-T matrix with rownames modality types that specifies the variable number for each modality of each modality type, where m is the number of modality types, T is the maximum number of modalities for one of modality types.

  • types - a string vector, the modality types;

  • Lam0 - a S-length list composed by a M-length vector list corresponding to the variances of error;

Examples

q <- 3; qsvec<-rep(2,3)
nvec <- c(100, 120, 100)
pveclist <-  list('gaussian'=rep(150, 1),'poisson'=rep(50, 2),'binomial'=rep(60, 2))
datlist <- gendata_mmgfm(seed = 1,  nvec = nvec, pveclist =pveclist,
                         q = q,  d= 3,qs = qsvec,  rho = rep(3,length(pveclist)), rho_z=0.5,
                         sigmavec=rep(0.5, length(pveclist)),  sigma_eps=1)