Statistical Modelling 19 (5) (2019), 545–568

Incomplete graphical model inference via latent tree aggregation

Geneviéve Robin,
Centre De Mathématiques Appliquées UMR 7641,
École Polytechnique,
X-POP, INRIA, Palaiseau,
France.
e-mail: genevieve.robin@polytechnique.edu

Christophe Ambroise,
Laboratoire de Mathématiques et Modélisation d’Évry Université Paris-Saclay,
Université d’Évry val d'Essonne,
Évry,
France.


Stéphane Robin,
Mathématiques et informatique appliquées – Paris AgroParisTech,
INRA, Université Paris-Saclay,
Paris,
France.


Abstract:

Graphical network inference is used in many fields such as genomics or ecology to infer the conditional independence structure between variables, from measurements of gene expression or species abundances for instance. In many practical cases, not all variables involved in the network have been observed, and the samples are actually drawn from a distribution where some variables have been marginalized out. This challenges the sparsity assumption commonly made in graphical model inference, since marginalization yields locally dense structures, even when the original network is sparse. We present a procedure for inferring Gaussian graphical models when some variables are unobserved, that accounts both for the influence of missing variables and the low density of the original network. Our model is based on the aggregation of spanning trees, and the estimation procedure on the expectation-maximization algorithm. We treat the graph structure and the unobserved nodes as missing variables and compute posterior probabilities of edge appearance. To provide a complete methodology, we also propose several model selection criteria to estimate the number of missing nodes. A simulation study and an illustration on flow cytometry data reveal that our method has favourable edge detection properties compared to existing graph inference techniques. The methods are implemented in an R package.

Keywords:

Gaussian graphical model; latent variables; EM algorithm; model selection.

Downloads:

Example data and code in zipped archive.
back