Use of SAS for Multinomial Random Effects Models

         by Jonathan Hartzel, Alan Agresti, Brian Caffo


Recently, the SAS procedure NLMIXED has become available for ML
fitting of generalized linear mixed models.  It is possible to adapt
NLMIXED for parametric fitting of multinomial logit models with random
effects by using the capability of specifying a general log likelihood
function.  This procedure uses the adaptive Gauss-Hermite quadrature
routine.  Thus, with sufficiently many quadrature points, it gives ML
estimates for such models.  Because it uses quadrature, however, it is
limited to relatively simply random effects structure such as the
models in this article have.

The accompanying table shows SAS code for fitting the
baseline-categories logit and adjacent-categories logit models
discussed in Section 5 of the paper.  In our experience, the default
number of quadrature points in SAS is often inadequate to give proper
convergence to ML estimates and their standard errors.  Thus, we
recommend sequential fitting with a successively increasing number of
points (using the `qpoints = ' option in the PROC NLMIXED statement)
until convergence appears to have occurred.

With the Gauss-Hermite quadrature approach to ML inference with GLMMs,
computer underflow can occur due to the multiplication of many very
small probabilities.  This is not a problem when using the MCEM
algorithm, as the probabilities are added on the log scale in the
expectation of the complete data log-likelihood.  The calculations
needed to run the rejection sampler for the MCEM algorithm can also be
performed on the log scale.  With Gauss-Hermite quadrature, computer
underflow can be a problem mainly when there are many within-cluster
observations.  For most data sets in our experience, however, it is
the number of clusters that is large and not the number of
observations within a cluster.  In using NLMIXED, we addressed this
problem by assigning the likelihood to a very small number within the
limits of computer precision.  Specifically we entered

if (z > 1e-8) then ll = log(z); else ll=-1e100

for this purpose.


  Example of SAS code (Version 8) for using PROC NLMIXED to
  fit adjacent-categories logit random effects model (22) and
  baseline-category logit model (23) to Table 1.

/*Input Data*/
data movie;
do lyons = 1 to 3;
 do siskel = 1 to 3;
  do medved = 1 to 3;
   do ebert = 1 to 3;
      input count @@;
      if (count ne 0) then output;
   end;
  end;
 end;
end;
datalines;
15 2 0   2 0 0   8 0 2
 0 3 0   2 3 0   3 2 1
 1 0 1   1 2 1   1 1 2

 2 0 0   2 1 0   0 1 0
 1 1 0   0 1 0   0 1 0
 1 0 0   0 0 0   0 1 1

 3 1 1   0 0 0   4 1 0
 0 0 1   0 0 0   1 3 0
 0 0 1   2 1 0   2 2 4
;


/*Create subject variable "movie" and code dummy variables*/
data movie;
set movie;
movie = _n_;
x1 = 1; x2 = 0; x3 = 0; resp = siskel; output;
x1 = 0; x2 = 1; x3 = 0; resp = ebert;  output;
x1 = 0; x2 = 0; x3 = 1; resp = lyons;  output;
x1 = 0; x2 = 0; x3 = 0; resp = medved; output;
drop lyons siskel medved ebert;

/*Recode responses as multivariate Bernoulli*/
data movie;
set movie;
  if resp = 1 then do;
     y1=1; y2=0; y3=0;
  end;
  else if resp = 2 then do;
     y1=0; y2=1; y3=0;
  end;
  else if resp = 3 then do;
     y1=0; y2=0; y3=1;
  end;
  else delete;
  output;
  drop resp;
run;

/*Run NLMIXED; add qpoints=? to change number of quadrature points*/
/*Adjacent Category Logit Model*/
proc nlmixed data=movie;
   /*Code linear predictors*/
   eta1 = alpha1 + x1 * beta1 + x2 * beta2 + x3 * beta3 + u1;
   eta2 = alpha2 + x1 * beta1 + x2 * beta2 + x3 * beta3 + u2;

   /*Adjacent category model*/
   pi1 = exp(eta1+eta2) / (exp(eta1+eta2) + exp(eta2) + 1);
   pi2 = exp(eta2) / (exp(eta1+eta2) + exp(eta2) + 1);
   pi3 = 1 / (exp(eta1+eta2) + exp(eta2) + 1);

   /*Define likelihood*/
   z = (pi1**y1)*(pi2**y2)*(pi3**y3);
   if (z > 1e-8) then ll = log(z);
   else ll=-1e100;
   model y1 ~ general(ll);

   /*Specify random effect distribution*/
   random  u1 u2 ~ normal([0,0], [s1*s1, cov12 ,s2*s2]) subject = movie;

   /*Replicate data points with identical likelihood values*/
   replicate count;
run;


/*Adjacent Category Logit Model with Uncorrelated Random Effects*/
proc nlmixed data=movie qpoints = 50;
   /*Code linear predictors*/
   eta1 = alpha1 + x1 * beta1 + x2 * beta2 + x3 * beta3 + u1;
   eta2 = alpha2 + x1 * beta1 + x2 * beta2 + x3 * beta3 + u2;

   /*Adjacent category*/
   pi1 = exp(eta1+eta2) / (exp(eta1+eta2) + exp(eta2) + 1);
   pi2 = exp(eta2) / (exp(eta1+eta2) + exp(eta2) + 1);
   pi3 = 1 / (exp(eta1+eta2) + exp(eta2) + 1);


   z = (pi1**y1)*(pi2**y2)*(pi3**y3);
   if (z > 1e-8) then ll = log(z);
   else ll=-1e100;
   model y1 ~ general(ll);
   random  u1 u2 ~ normal([0,0], [s1*s1, 0 ,s2*s2]) subject = movie;
  replicate count;
run;

/*Baseline Category Logit Model*/
proc nlmixed data=movie;
   /*Code linear predictors*/
   eta1 = alpha1 + x1 * beta11 + x2 * beta21 + x3 * beta31 + u1;
   eta2 = alpha2 + x1 * beta12 + x2 * beta22 + x3 * beta32 + u2;

   /*Baseline category model*/
   pi1 = exp(eta1) / (exp(eta1) + exp(eta2) + 1);
   pi2 = exp(eta2) / (exp(eta1) + exp(eta2) + 1);
   pi3 = 1 / (exp(eta1) + exp(eta2) + 1);

   /*Define likelihood*/
   z = (pi1**y1)*(pi2**y2)*(pi3**y3);
   if (z > 1e-8) then ll = log(z);
   else ll = -1e100;
   model y1 ~ general(ll);

   /*Specify random effect distribution*/
   random  u1 u2 ~ normal([0,0], [s1*s1, cov12 ,s2*s2]) subject = movie;

   /*Replicate data points with identical likelihood values*/
   replicate count;
run;