Statistical Modelling 23 (4) (2023), 376–399

Bayesian analysis of two-part nonlinear latent variable model: Semiparametric method

Jian-Wei Gou,
Department of Applied Mathematics,
School of Science,
Nanjing Forestry University,
Nanjing,
Jiangsu,
China.

Ye-Mao Xia,
Department of Applied Mathematics,
School of Science,
Nanjing Forestry University,
Nanjing,
Jiangsu,
China.
e-mail: ym_xia71@163.com

De-Peng Jiang,
Department of Community Health Sciences,
University of Manitoba,
Manitoba,
Canada.


Abstract:

Two-part model (TPM) is a widely appreciated statistical method for analyzing semi-continuous data. Semi-continuous data can be viewed as arising from two distinct stochastic processes: one governs the occurrence or binary part of data and the other determines the intensity or continuous part. In the regression setting with the semi-continuous outcome as functions of covariates, the binary part is commonly modelled via logistic regression and the continuous component via a log-normal model. The conventional TPM, still imposes assumptions such as log-normal distribution of the continuous part, with no unobserved heterogeneity among the response, and no collinearity among covariates, which are quite often unrealistic in practical applications. In this article, we develop a two-part nonlinear latent variable model (TPNLVM) with mixed multiple semi-continuous and continuous variables. The semi-continuous variables are treated as indicators of the latent factor analysis along with other manifest variables. This reduces the dimensionality of the regression model and alleviates the potential multicollinearity problems. Our TPNLVM can accommodate the nonlinear relationships among latent variables extracted from the factor analysis. To downweight the influence of distribution deviations and extreme observations, we develop a Bayesian semiparametric analysis procedure. The conventional parametric assumptions on the related distributions are relaxed and the Dirichlet process (DP) prior is used to improve model fitting. By taking advantage of the discreteness of DP, our method is effective in capturing the heterogeneity underlying population. Within the Bayesian paradigm, posterior inferences including parameters estimates and model assessment are carried out through Markov Chains Monte Carlo (MCMC) sampling method. To facilitate posterior sampling, we adapt the Polya-Gamma stochastic representation for the logistic model. Using simulation studies, we examine properties and merits of our proposed methods and illustrate our approach by evaluating the effect of treatment on cocaine use and examining whether the treatment effect is moderated by psychiatric problems.

Keywords:

Markov Chains Monte Carlo, semi-parametric Bayesian methods, semi-continuous data, truncated Dirichlet process, two-part nonlinear latent variable model

Downloads:

Data and R Code, Supplementary material.


back