Statistical Modelling 15 (3) (2015), 215–232

Dirichlet Lasso: A Bayesian approach to variable selection

Kiranmoy Das
Bayesian and Interdisciplinary Research Unit,
Indian Statistical Institute,
Kolkata, India
e-mail: kiranmoy.das@gmail.com

Marc Sobel
Department of Statistics,
Temple University,
Philadelphia, USA


Abstract:

Selection of the most important predictor variables in regression analysis is one of the key problems statistical research has been concerned with for long time. In this article, we propose the methodology, Dirichlet Lasso (abbreviated as DLASSO) to address this issue in a Bayesian framework. In many modern regression settings, large set of predictor variables are grouped and the coefficients belonging to any one of these groups are either all redundant or all important in predicting the response; we say in those cases that the predictors exhibit a group structure. We show that DLASSO is particularly useful where the group structure is not fully known. We exploit the clustering property of Dirichlet Process priors to infer the possibly missing group information. The Dirichlet Process has the advantage of simultaneously clustering the variable coefficients and selecting the best set of predictor variables. We compare the predictive performance of DLASSO to Group Lasso and ordinary Lasso with real data and simulation studies. Our results demonstrate that the predictive performance of DLASSO is almost as good as that of Group Lasso when group label information is given; and superior to the ordinary Lasso for missing group information. For high dimensional data (e.g., genetic data) with missing group information, DLASSO will be a powerful approach of variable selection since it provides a superior predictive performance and higher statistical accuracy.

Keywords:

Bayesian Lasso; Dirichlet process prior; Group lasso; Gibbs sampling; M-H algorithm.
back