Use of SAS for Multinomial Random Effects Models by Jonathan Hartzel, Alan Agresti, Brian Caffo Recently, the SAS procedure NLMIXED has become available for ML fitting of generalized linear mixed models. It is possible to adapt NLMIXED for parametric fitting of multinomial logit models with random effects by using the capability of specifying a general log likelihood function. This procedure uses the adaptive Gauss-Hermite quadrature routine. Thus, with sufficiently many quadrature points, it gives ML estimates for such models. Because it uses quadrature, however, it is limited to relatively simply random effects structure such as the models in this article have. The accompanying table shows SAS code for fitting the baseline-categories logit and adjacent-categories logit models discussed in Section 5 of the paper. In our experience, the default number of quadrature points in SAS is often inadequate to give proper convergence to ML estimates and their standard errors. Thus, we recommend sequential fitting with a successively increasing number of points (using the `qpoints = ' option in the PROC NLMIXED statement) until convergence appears to have occurred. With the Gauss-Hermite quadrature approach to ML inference with GLMMs, computer underflow can occur due to the multiplication of many very small probabilities. This is not a problem when using the MCEM algorithm, as the probabilities are added on the log scale in the expectation of the complete data log-likelihood. The calculations needed to run the rejection sampler for the MCEM algorithm can also be performed on the log scale. With Gauss-Hermite quadrature, computer underflow can be a problem mainly when there are many within-cluster observations. For most data sets in our experience, however, it is the number of clusters that is large and not the number of observations within a cluster. In using NLMIXED, we addressed this problem by assigning the likelihood to a very small number within the limits of computer precision. Specifically we entered if (z > 1e-8) then ll = log(z); else ll=-1e100 for this purpose. Example of SAS code (Version 8) for using PROC NLMIXED to fit adjacent-categories logit random effects model (22) and baseline-category logit model (23) to Table 1. /*Input Data*/ data movie; do lyons = 1 to 3; do siskel = 1 to 3; do medved = 1 to 3; do ebert = 1 to 3; input count @@; if (count ne 0) then output; end; end; end; end; datalines; 15 2 0 2 0 0 8 0 2 0 3 0 2 3 0 3 2 1 1 0 1 1 2 1 1 1 2 2 0 0 2 1 0 0 1 0 1 1 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 1 3 1 1 0 0 0 4 1 0 0 0 1 0 0 0 1 3 0 0 0 1 2 1 0 2 2 4 ; /*Create subject variable "movie" and code dummy variables*/ data movie; set movie; movie = _n_; x1 = 1; x2 = 0; x3 = 0; resp = siskel; output; x1 = 0; x2 = 1; x3 = 0; resp = ebert; output; x1 = 0; x2 = 0; x3 = 1; resp = lyons; output; x1 = 0; x2 = 0; x3 = 0; resp = medved; output; drop lyons siskel medved ebert; /*Recode responses as multivariate Bernoulli*/ data movie; set movie; if resp = 1 then do; y1=1; y2=0; y3=0; end; else if resp = 2 then do; y1=0; y2=1; y3=0; end; else if resp = 3 then do; y1=0; y2=0; y3=1; end; else delete; output; drop resp; run; /*Run NLMIXED; add qpoints=? to change number of quadrature points*/ /*Adjacent Category Logit Model*/ proc nlmixed data=movie; /*Code linear predictors*/ eta1 = alpha1 + x1 * beta1 + x2 * beta2 + x3 * beta3 + u1; eta2 = alpha2 + x1 * beta1 + x2 * beta2 + x3 * beta3 + u2; /*Adjacent category model*/ pi1 = exp(eta1+eta2) / (exp(eta1+eta2) + exp(eta2) + 1); pi2 = exp(eta2) / (exp(eta1+eta2) + exp(eta2) + 1); pi3 = 1 / (exp(eta1+eta2) + exp(eta2) + 1); /*Define likelihood*/ z = (pi1**y1)*(pi2**y2)*(pi3**y3); if (z > 1e-8) then ll = log(z); else ll=-1e100; model y1 ~ general(ll); /*Specify random effect distribution*/ random u1 u2 ~ normal([0,0], [s1*s1, cov12 ,s2*s2]) subject = movie; /*Replicate data points with identical likelihood values*/ replicate count; run; /*Adjacent Category Logit Model with Uncorrelated Random Effects*/ proc nlmixed data=movie qpoints = 50; /*Code linear predictors*/ eta1 = alpha1 + x1 * beta1 + x2 * beta2 + x3 * beta3 + u1; eta2 = alpha2 + x1 * beta1 + x2 * beta2 + x3 * beta3 + u2; /*Adjacent category*/ pi1 = exp(eta1+eta2) / (exp(eta1+eta2) + exp(eta2) + 1); pi2 = exp(eta2) / (exp(eta1+eta2) + exp(eta2) + 1); pi3 = 1 / (exp(eta1+eta2) + exp(eta2) + 1); z = (pi1**y1)*(pi2**y2)*(pi3**y3); if (z > 1e-8) then ll = log(z); else ll=-1e100; model y1 ~ general(ll); random u1 u2 ~ normal([0,0], [s1*s1, 0 ,s2*s2]) subject = movie; replicate count; run; /*Baseline Category Logit Model*/ proc nlmixed data=movie; /*Code linear predictors*/ eta1 = alpha1 + x1 * beta11 + x2 * beta21 + x3 * beta31 + u1; eta2 = alpha2 + x1 * beta12 + x2 * beta22 + x3 * beta32 + u2; /*Baseline category model*/ pi1 = exp(eta1) / (exp(eta1) + exp(eta2) + 1); pi2 = exp(eta2) / (exp(eta1) + exp(eta2) + 1); pi3 = 1 / (exp(eta1) + exp(eta2) + 1); /*Define likelihood*/ z = (pi1**y1)*(pi2**y2)*(pi3**y3); if (z > 1e-8) then ll = log(z); else ll = -1e100; model y1 ~ general(ll); /*Specify random effect distribution*/ random u1 u2 ~ normal([0,0], [s1*s1, cov12 ,s2*s2]) subject = movie; /*Replicate data points with identical likelihood values*/ replicate count; run;