Statistical Modelling 16 (2) (2016), 91–113

Using a latent variable model with non-constant factor loadings to examine PM2.5 constituents related to secondary inorganic aerosols

Zhenzhen Zhang
Department of Biostatistics,
University of Michigan,
Ann Arbor,
USA


Marie S. O'Neill
Department of Environmental Health Sciences,
University of Michigan,
Ann Arbor,
USA


and

Department of Epidemiology,
University of Michigan,
Ann Arbor,
USA


Brisa N. Sánchez
Department of Biostatistics,
University of Michigan,
Ann Arbor,
USA
e-mail: brisa@umich.edu

Abstract:

Factor analysis is a commonly used method of modelling correlated multivariate exposure data. Typically, the measurement model is assumed to have constant factor loadings. However, from our preliminary analyses of the Environmental Protection Agency's (EPA's) PM2.5 fine speciation data, we have observed that the factor loadings for four constituents change considerably in stratified analyses. Since invariance of factor loadings is a prerequisite for valid comparison of the underlying latent variables, we propose a factor model that includes non-constant factor loadings that change over time and space using P-spline penalized with the generalized cross-validation (GCV) criterion. The model is implemented using the Expectation-Maximization (EM) algorithm and we select the multiple spline smoothing parameters by minimizing the GCV criterion with Newton's method during each iteration of the EM algorithm. The algorithm is applied to a one-factor model that includes four constituents. Through bootstrap confidence bands, we find that the factor loading for total nitrate changes across seasons and geographic regions.

Keywords:

factor model; Non-constant factor loading; P-spline; tensor product spline basis; generalized cross-validation; EM algorithm; PM2.5 constituents.

Downloads:

Example data and code in zipped archive.
back