Generate simulated data for multi-study factor analysis under different error distributions. The data follows a factor model with common factors (shared across studies) and study-specific factors (unique to each study), plus noise.
Integer, default = 1. Random seed for reproducibility of simulated data.
Numeric vector (length >= 2). Sample sizes of each study (e.g., `c(150, 200)` for 2 studies with 150 and 200 samples).
Integer, default = 50. Number of variables (features) in the data.
Integer, default = 3. Number of common factors (shared across all studies).
Numeric vector with length equal to `length(nvec)`, default = `rep(2, length(nvec))`. Number of study-specific factors for each study (e.g., `c(2,2)` for 2 studies each with 2 specific factors).
Character, default = "gaussian". Error distribution type, one of: - "gaussian": Gaussian (normal) distribution;
- "mvt": Multivariate t-distribution;
- "exp": Exponential distribution (centered to mean 0);
- "t": Univariate t-distribution (independent across variables);
- "mixnorm": Mixture of two normal distributions;
- "pareto": Pareto distribution (centered to mean 0).
Numeric vector of length 2, default = `c(1,1)`. Scaling factors for: - `rho1`: Common factor loadings (matrix `A0`); - `rho2`: Study-specific factor loadings (matrix list `Blist0`).
Numeric, default = 0.1. Variance of the error term (controls noise level).
Integer, default = 1. Degrees of freedom for t-distribution ("mvt" or "t" `err.type`). Ignored for other error distributions.
A list containing the simulated data and true parameter values (for model evaluation):
Xlist: List of matrices. Each element is a data matrix (ns × p) for study s,
where ns = `nvec[s]` (sample size of study s), p = number of variables.
mu0: Matrix (p × S). True mean vector for each variable (row) in each study (column),
where S = `length(nvec)` (number of studies).
A0: Matrix (p × q). True common factor loadings (shared across all studies) —
constructed as the first q columns of an orthogonal matrix (`A1`) generated internally.
This is the "ground truth" that modeling functions (e.g., MultiRFM) aim to estimate.
Blist0: List of matrices. Each element is a true study-specific factor loadings matrix (p × qs[s])
for study s. Constructed from orthogonal matrices (similar to `A0`) and scaled by `rho[2]`.
Another "ground truth" for model evaluation.
Flist: List of matrices. Each element is a true common factor score matrix (ns × q) for study s,
generated from a standard normal distribution. These are the latent common factor values used to generate `Xlist`.
Hlist: List of matrices. Each element is a true study-specific factor score matrix (ns × qs[s])
for study s, generated from a standard normal distribution. Latent specific factor values used to generate `Xlist`.
q: Integer. Number of common factors used for data generation (same as input `q`, for reference).
qs: Numeric vector. Number of study-specific factors used for data generation (same as input `qs`, for reference).
The simulated data follows the multi-study factor model:
Xs = mu0s + Fs x A0 + Hs x B0s + epsilons
True parameters (`A0`, `Blist0`, `mu0`) are generated with orthogonal constraints to ensure identifiability.
# Example 1: Gaussian error (2 studies, 100/200 samples, 50 variables)
set.seed(123)
sim_data <- gendata_simu_multi(
seed = 123,
nvec = c(100, 200),
p = 50,
q = 3, # 3 common factors
qs = c(2, 2), # 2 specific factors per study
err.type = "gaussian",
rho = c(1, 1),
sigma2_eps = 0.1
)
str(sim_data) # Check structure of simulated data
#> List of 8
#> $ Xlist :List of 2
#> ..$ : num [1:100, 1:50] 0.322 -0.524 1.629 -2.444 -1.823 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : NULL
#> ..$ : num [1:200, 1:50] -1.086 0.464 -0.245 -1.001 -0.08 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : NULL
#> $ mu0 : num [1:50, 1:2] -0.5605 -0.2302 1.5587 0.0705 0.1293 ...
#> $ A0 : num [1:50, 1:3] 0.496 -0.18 0.172 0.243 0.665 ...
#> $ Blist0:List of 2
#> ..$ : num [1:50, 1:2] 0.131 0.1408 0.1362 -0.0422 -0.4109 ...
#> ..$ : num [1:50, 1:2] 0.2264 -0.0467 -0.4008 0.2853 0.0203 ...
#> $ Flist :List of 2
#> ..$ : num [1:100, 1:3] -0.6042 0.9134 1.6575 -0.0389 -0.3313 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : NULL
#> ..$ : num [1:200, 1:3] -0.0731 -0.1409 -0.7031 0.2109 0.797 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : NULL
#> $ Hlist :List of 2
#> ..$ : num [1:100, 1:2] 0.25352 0.24352 0.71959 0.77273 -0.00852 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : NULL
#> ..$ : num [1:200, 1:2] -1.9131 -0.3297 -0.9296 0.1346 0.0831 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : NULL
#> $ q : num 3
#> $ qs : num [1:2] 2 2
# Extract true parameters for model evaluation
true_A <- sim_data$A0 # True common loadings
true_B1 <- sim_data$Blist0[[1]] # True specific loadings (study 1)