Statistical Modelling 19 (4) (2019), 362–385

Pǿlya–Aeppli regression model for overdispersed count data

Patrick Borges
Departamento de Estatística,
Universidade Federal do Espírito Santo,
Vitória, ES,
Brazil.


Luciana G. Godoi
Departamento de Estatística,
Universidade Federal do Espírito Santo,
Vitória, ES,
Brazil.
e-mail: luciana.godoi@ufes.br

Abstract:

The log-linear Poisson model, characterized by linear variance function and a logarithmic relation between means and covariates, embedded in the exponential family regression framework provided by generalized linear models (GLM) is still the standard approach for analyzing count data responses with regression models. In practice, however, count data are often overdispersed and, thus, not conducive to Poisson regression. Therefore, the main goal of this article is to introduce a log-linear model based on the Pǿlya–Aeppli (PA) distribution, which is an extension of the Poisson distribution by including a dispersion parameter ρ, to address the problem of overdispersion. Maximum likelihood (ML) estimation procedure is discussed as well as a test for determining the need for a PA regression over a standard Poisson regression. In addition, a simple EM-type algorithm for iteratively computing ML estimates is presented. In order to study departures from the error assumption as well as the presence of outliers, we perform residual analysis based on the standardized Pearson residuals. Furthermore, for different parameter settings and sample sizes, various simulations are performed. Finally, we also illustrated the new method on three real datasets, two of them are from biological researches and the other is from a violence study.

Keywords:

bootstrap; EM algorithm; Generalized linear models (GLM); overdispersion; zero-inflated models.

Downloads:

Example data and code in zipped archive.
back