Statistical Modelling 15 (2) (2015), 159–174

Sparse principal balances

Mehmet Can Mert
Department of Statistics and Probability Theory,
Vienna University of Technology,
Vienna,
Austria


Peter Filzmoser
Department of Statistics and Probability Theory,
Vienna University of Technology,
Vienna,
Austria
e-mail: P.Filzmoser@tuwien.ac.at

Karen Hron
Department of Mathematical Analysis and Applications of Mathematics,
Faculty of Science,
Palacký University,
Olomouc,
Czech Republic


and

Department of Geoinformatics,
Faculty of Science,
Palacký University,
Olomouc,
Czech Republic


Abstract:

Compositional data analysis deals with situations where the relevant information is contained only in the ratios between the measured variables, and not in the reported values. This article focuses on high-dimensional compositional data (in the sense of hundreds or even thousands of variables), as they appear in chemometrics (e.g., mass spectral data), proteomics or genomics. The goal of this contribution is to perform a dimension reduction of such data, where the new directions should allow for interpretability. An approach named principal balances turned out to be successful for low dimensions. Here, the concept of sparse principal component analysis is proposed for constructing principal directions, the so-called sparse principal balances. They are sparse (contain many zeros), build an orthonormal basis in the sample space of the compositional data, are efficient for dimension reduction and are applicable to high-dimensional data.

Keywords:

principal component analysis; compositional data; isometric logratio transformation; sparseness.
back