Linear regression when covariates include missing values embedding the correlation information between covariates by Iterative Least Square Estimation.

  # S3 method for formula
ilse(formula, data=NULL, bw=NULL,  k.type=NULL,  method="Par.cond", ...)
  # S3 method for numeric
ilse(Y, X,bw=NULL, k.type=NULL, method="Par.cond", max.iter=20,
  peps=1e-5, feps = 1e-7, arma=TRUE, verbose=FALSE, ...)



Arguments passed to other methods.


an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'.


a numeric vector, the reponse variable.


a numeric matrix that may include NAs, the covariate matrix.


an optional data frame, list or environment (or object coercible by to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which ilse is called.


a positive value, specify the bandwidth in estimating missing values, default as NULL. When bw=NULL, it is automatically selected by empirical method.


an optional character string, specify the type of kernel used in iterative estimating algorithm and support 'epk', 'biweight', 'triangle', 'gaussian', 'triweight', 'tricube', 'cosine', 'uniform' in current version, defualt as 'gaussian'.


an optional character string, specify the iterative algorithm, support 'Par.cond' and 'Full.cond' in current version.


an optional positive integer, the maximum iterative times, defualt as '20'.


an optional positive value, tolerance vlaue of relative variation rate of estimated parametric vector, default as '1e-7'.


an optional positive vlaue, tolerance vlaue of relative variation rate of objective function value, default as '1e-7'.


an optional logical value, whether use armadillo and Rcpp to speed computation, default as TRUE


an optional logical value, indicate whether output the iterative information, default as 'TRUE'.


Models for ilse are specified symbolically. A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed. A specification of the form first:second indicates the set of terms obtained by taking the interactions of all terms in first with all terms in second. The specification first*second indicates the cross of first and second. This is the same as first + second + first:second.


ilse returns an object of class "ilse".

The functions summary and anova are used to obtain and print a summary and analysis of variance table of the results. The generic accessor functions coefficients, effects, fitted.values and residuals extract various useful features of the value returned by lm.

An object of class "ilse" is a list containing at least the following components:


a named vector of coefficients


a imputed design matrix


a nonnegative value, vlaue of relative variation rate of objective function value


a nonnegative value, relative variation rate of estimated parametric vector when algorithm stopped.


a positive integer, iterative times in total.


the residuals, that is response minus fitted values.


the fitted mean values.


a list including all input arguments.


Huazhen Lin, Wei Liu, & Wei Lan (2021). Regression Analysis with individual-specific patterns of missing covariates. Journal of Business & Economic Statistics, 39(1), 179-188.


Wei Liu



See also


## exmaple one: include missing value data(nhanes) NAlm1 <- ilse(age~., data=nhanes,bw=1, method = 'Par.cond', k.type='gaussian', verbose = TRUE)
#> iter=2, d_fn=1.000000, d_par = 0.362223 #> iter=3, d_fn=0.001542, d_par = 0.001063 #> iter=4, d_fn=0.000009, d_par = 0.000021 #> iter=5, d_fn=0.000000, d_par = 0.000000
#> $beta #> (Intercept) bmi hyp chl #> 2.06942664 -0.11656313 0.62109971 0.01056364 #> #> $d.fn #> [1] 6.654795e-08 #> #> $d.par #> [1] 1.096466e-07 #> #> $iterations #> [1] 5 #>
NAlm2 <- ilse(age~., data=nhanes, method = 'Full.cond') print(NAlm2)
#> $beta #> (Intercept) bmi hyp chl #> 2.06942664 -0.11656313 0.62109971 0.01056364 #> #> $d.fn #> [1] 6.654795e-08 #> #> $d.par #> [1] 1.096466e-07 #> #> $iterations #> [1] 5 #>
## example two: No missing value n <- 100 group <- rnorm(n, sd=4) weight <- 3.2*group + 1.5 + rnorm(n, sd=0.1) NAlm3 <- ilse(weight~group, data=data.frame(weight=weight, group=group), intercept = FALSE) print(NAlm3)
#> $beta #> (Intercept) group #> 1.518525 3.197213 #> #> $d.fn #> [1] 1 #> #> $d.par #> [1] 1.986027e-15 #> #> $iterations #> [1] 2 #>