Statistical Modelling 23 (1) (2023), 53–80

Mixed effect modelling and variable selection for quantile regression

Haim Bar,
Department of Statistics,
University of Connecticut,
Storrs,
CT,
USA.

James G Booth,
Department of Statistics and Data Science,
Cornell University,
Ithaca,
NY,
USA.
e-mail: jim.booth@cornell.edu

Martin T Wells,
Department of Statistics and Data Science,
Cornell University,
Ithaca,
NY,
USA.

Abstract:

It is known that the estimating equations for quantile regression (QR) can be solved using an EM algorithm in which the M-step is computed via weighted least squares, with weights computed at the E-step as the expectation of independent generalized inverse-Gaussian variables. This fact is exploited here to extend QR to allow for random effects in the linear predictor. Convergence of the algorithm in this setting is established by showing that it is a generalized alternating minimization (GAM) procedure. Another modification of the EM algorithm also allows us to adapt a recently proposed method for variable selection in mean regression models to the QR setting. Simulations show the resulting method significantly outperforms variable selection in QR models using the lasso penalty. Applications to real data include a frailty QR analysis of hospital stays, and variable selection for age at onset of lung cancer and for riboflavin production rate using high-dimensional gene expression arrays for prediction.

Keywords:

Expectation-maximization (EM) algorithm, Generalized alternating minimization (GAM) algorithm, high-dimensional estimation, mixture model, mixed effects regression, model diagnostics, variable selection.

Downloads:

Code and data in zipped archive.


back