Statistical Modelling 20 (3) (2020), 249–273

A comparison of generalised linear models and compositional models for ordered categorical data

Ondřej Vencálek,
Department of Mathematical Analysis and Applications of Mathematics,
Faculty of Science,
Palacký University,
Olomouc,
Czech Republic
e-mail: ondrej.vencalek@upol.cz

Karel Hron,
Department of Mathematical Analysis and Applications of Mathematics,
Faculty of Science,
Palacký University,
Olomouc,
Czech Republic


Peter Filzmoser,
Institute of Statistics and Mathematical Methods in Economics,
Vienna University of Technology,
Vienna,
Austria


Abstract:

Ordered categorical data occur in many applied fields, such as geochemistry, econometrics, sociology and demography or even transportation research, for example, in the form of results from various questionnaires. There are different possibilities for modelling proportions of individual categories. Generalised linear models (GLMs) are traditionally used for this purpose, but also methods of compositional data analysis (CoDa) can be considered. Here, both approaches are compared in depth. Particularly, different assumptions of the models on variability are highlighted. Advantages and disadvantages of individual models are pointed out. While the CoDa model may be inappropriate when the variability of the compositional coordinates depends on the regressors, for example, due to different total counts on which the coordinates are based, the GLM may underestimate the uncertainty of the predictions considerably in case of large-scale data.

Keywords:

Ordered categories; logratio coordinates; generalised linear model; compositional; data analysis; proportions; regression.

Downloads:

Example data and code in zipped archive.
back