Statistical Modelling 12 (2012), 3–27

A solution to separation for clustered binary data

José Cortiñas Abrahantes
Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat),
Center for Statistics,
Hasselt University
Belgium
and
European Food Safety Authority (EFSA)
Via Carlo Magno 1A
I–43126 Parma
Italy
eMail: jose.cortinasabrahantes@efsa.europa.eu

Marc Aerts
Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat),
Center for Statistics,
Hasselt University
Belgium

Abstract:

The presence of one or more covariates that perfectly or almost perfectly predict the outcome of interest (which is referred to as complete or quasi-complete separation, the latter denoting the case when such perfect prediction occurs only for a subset of observations in the data) has been extensively studied in the last four decades. Since 1984, when Albert and Anderson (1984) differentiated between complete and quasi-complete separation, several authors have studied this phenomenon and tried to provide answers or ways of identifying the problem (Lesaffre and Albert, 1989; Firth, 1993; Christmann and Rousseeuw, 2001; Rousseeuw and Christmann, 2003; Allison, 2004; Zorn, 2005; Heinze, 2006). From an estimation perspective, separation leads to infinite coefficients and standard errors, which makes the algorithm collapse or give inappropriate results. As a practical matter, separation forces the analyst to choose from a number of problematic alternatives for dealing with the problem, and in the past the elimination of such problematic variables were common practice to deal with such situations. In the last decade, solutions using penalized likelihood have been proposed, but always dealing with independent binary data. Here we will propose a Bayesian solution to the problem when we deal with clustered binary data using informative priors that are supported by the data and compare it with an alternative procedure proposed by Gelman et al. (2008).

Keywords:

Separation issues; clustered binary data; logistic model; Bayesian analysis; conditional models; penalized likelihood approach

Downloads:

R-code and example data in zipped archive
back