Statistical Modelling 20 (1) (2020), 96–119

Component-based regularization of a multivariate GLM with a thematic partitioning of the explanatory variables

Xavier Bry,
Institut Montpelliérain Alexander Grothendieck,
Université Montpellier, CNRS,
Montpellier,
France.


Catherine Trottier,
Institut Montpelliérain Alexander Grothendieck,
Université Montpellier, CNRS,
Montpellier,
France.
e-mail: catherine.trottier@umontpellier.fr
and

Université Paul-Valéry Montpellier,
Montpellier,
France.


Frédéric Mortier,
Forêts et Sociétés,
Université Montpellier, Cirad,
Montpellier,
France.


Guillaume Cornu,
Forêts et Sociétés,
Université Montpellier, Cirad,
Montpellier,
France.


Abstract:

We address component-based regularization of a multivariate generalized linear model (GLM). A vector of random responses Y is assumed to depend, through a GLM, on a set X of explanatory variables, as well as on a set A of additional covariates. X is partitioned into R conceptually homogenous variable groups X1,…,XR, viewed as explanatory themes. Variables in each Xr are assumed many and redundant. Thus, generalized linear regression demands dimension reduction and regularization with respect to each Xr. By contrast, variables in A are assumed few and selected so as to demand no regularization. Regularization is performed searching each Xr for an appropriate number of orthogonal components that both contribute to model Y and capture relevant structural information in Xr. To estimate a single-theme model, we first propose an enhanced version of Supervised Component Generalized Linear Regression (SCGLR), based on a flexible measure of structural relevance of components, and able to deal with mixed-type explanatory variables. Then, to estimate the multiple-theme model, we develop an algorithm encapsulating this enhanced SCGLR: THEME-SCGLR. The method is tested on simulated data and then applied to rainforest data in order to model the abundance of tree species.

Keywords:

components; multivariate generalized linear model; regularization; SCGLR; dimension reduction.
back