Statistical Modelling 23 (4) (2023), 376399
Bayesian analysis of two-part nonlinear latent variable model: Semiparametric method
Jian-Wei Gou,
Department of Applied Mathematics,
School of Science,
Nanjing Forestry University,
Nanjing,
Jiangsu,
China.
Ye-Mao Xia,
Department of Applied Mathematics,
School of Science,
Nanjing Forestry University,
Nanjing,
Jiangsu,
China.
e-mail: ym_xia71@163.com
De-Peng Jiang,
Department of Community Health Sciences,
University of Manitoba,
Manitoba,
Canada.
Abstract:
Two-part model (TPM) is a widely appreciated statistical method for analyzing
semi-continuous data. Semi-continuous data can be viewed as arising from two distinct stochastic
processes: one governs the occurrence or binary part of data and the other determines the intensity or
continuous part. In the regression setting with the semi-continuous outcome as functions of covariates,
the binary part is commonly modelled via logistic regression and the continuous component via a
log-normal model. The conventional TPM, still imposes assumptions such as log-normal distribution
of the continuous part, with no unobserved heterogeneity among the response, and no collinearity
among covariates, which are quite often unrealistic in practical applications. In this article, we develop
a two-part nonlinear latent variable model (TPNLVM) with mixed multiple semi-continuous and
continuous variables. The semi-continuous variables are treated as indicators of the latent factor
analysis along with other manifest variables. This reduces the dimensionality of the regression model
and alleviates the potential multicollinearity problems. Our TPNLVM can accommodate the nonlinear
relationships among latent variables extracted from the factor analysis. To downweight the influence
of distribution deviations and extreme observations, we develop a Bayesian semiparametric analysis
procedure. The conventional parametric assumptions on the related distributions are relaxed and the
Dirichlet process (DP) prior is used to improve model fitting. By taking advantage of the discreteness of
DP, our method is effective in capturing the heterogeneity underlying population. Within the Bayesian
paradigm, posterior inferences including parameters estimates and model assessment are carried out
through Markov Chains Monte Carlo (MCMC) sampling method. To facilitate posterior sampling,
we adapt the Polya-Gamma stochastic representation for the logistic model. Using simulation studies,
we examine properties and merits of our proposed methods and illustrate our approach by evaluating
the effect of treatment on cocaine use and examining whether the treatment effect is moderated by
psychiatric problems.
Keywords:
Markov Chains Monte Carlo, semi-parametric Bayesian methods, semi-continuous data,
truncated Dirichlet process, two-part nonlinear latent variable model
Downloads:
Data and R Code, Supplementary material.
back