Statistical Modelling 18 (5-6) (2018), 460–482

Predicting matches in international football tournaments with random forests

Gunther Schauberger
Chair of Epidemiology,
Department of Sport and Health Sciences,
Technical University of Munich,
Germany.
e-mail: gunther.schauberger@tum.de

and

Department of Statistics,
Ludwig-Maximilians-Universität München,
Germany.


Andreas Groll
Faculty of Statistics,
Technische Universität Dortmund,
Germany.


Abstract:

Many approaches that analyse and predict results of international matches in football are based on statistical models incorporating several potentially influential covariates with respect to a national team's success, such as the bookmakers’ ratings or the FIFA ranking. Based on all matches from the four previous FIFA World Cups 2002–2014, we compare the most common regression models that are based on the teams’ covariate information with regard to their predictive performances with an alternative modelling class, the so-called random forests. Random forests can be seen as a mixture between machine learning and statistical modelling and are known for their high predictive power. Here, we consider two different types of random forests depending on the choice of response. One type of random forests predicts the precise numbers of goals, while the other type considers the three match outcomes—win, draw and loss—using special algorithms for ordinal responses. To account for the specific data structure of football matches, in particular at FIFA World Cups, the random forest methods are slightly altered compared to their standard versions and adapted to the specific needs of the application to FIFA World Cup data.

Keywords:

random forests; football; FIFA World Cups; Poisson regression; regularization.

Downloads:

Example data and code in zipped archive.
back