Generalized Latent Variable Modeling by Skrondal and Rabe-Hesketh
Section 9.2: Respiratory Infection in Children

Random intercept logistic regression model and GEE

The data set used in this section is xerop.dat. We thank Al Sommer, Keith West, Joanne Katz, Scott Zeger and Patrick Heagerty for allowing us to make the data available.

The do-file is ichs.do.

The programs we use are xtgee, gllamm, gllasim and gllapred. You can find the last three programs and download them by issuing the command ssc describe gllamm and following instructions; see here for more information on installing the GLLAMM programs. For more information on these commands see http://www.gllamm.org.


Read and prepare the data
infile id resp cons age xero cosine sine female height stunted time age1 season time2 using xerop.dat
list id time age xero cosine sine female height stunted in 1/20, clean

           id   time   resp   age   xero   cosine   sine   female   height   stunted  
  1.   121013      1      0    31      0       -1      0        0       -3         0  
  2.   121013      2      0    34      0        0     -1        0       -3         0  
  3.   121013      3      0    37      0        1      0        0       -2         0  
  4.   121013      4      0    40      0        0      1        0       -2         0  
  5.   121013      5      1    43      0       -1      0        0       -2         0  
  6.   121013      6      0    46      0        0     -1        0       -3         0  
  7.   121113      1      0    -9      0       -1      0        1        2         0  
  8.   121113      2      0    -6      0        0     -1        1        0         0  
  9.   121113      3      0    -3      0        1      0        1       -1         0  
 10.   121113      4      0     0      0        0      1        1       -2         0  
 11.   121113      5      1     3      0       -1      0        1       -3         0  
 12.   121113      6      0     6      0        0     -1        1       -3         0  
 13.   121114      1      0   -26      0       -1      0        0        8         0  
 14.   121114      2      0   -23      0        0     -1        0        5         0  
 15.   121114      3      0   -20      0        1      0        0        3         0  
 16.   121114      4      1   -17      0        0      1        0        0         0  
 17.   121114      5      1   -14      0       -1      0        0        0         0  
 18.   121114      6      0   -11      0        0     -1        0        0         0  
 19.   121140      1      0   -19      0       -1      0        1        5         0  
 20.   121140      2      0   -16      1        0     -1        1        4         0  

Create indicator for last observation for each person for later use

sort id time
qui by id: gen last=_n==_N


Generalized estimating equations (GEE) using xtgee for Table 9.1, saving estimates using _estimates hold for later use (Figure 9.1). Iteration logs are not shown.

xtgee resp age xero cosine sine female height stunted, i(id) corr(exch) l(logit) f(binom) robust
_estimates hold gee

GEE population-averaged model                   Number of obs      =      1200
Group variable:                         id      Number of groups   =       275
Link:                                logit      Obs per group: min =         1
Family:                           binomial                     avg =       4.4
Correlation:                  exchangeable                     max =         6
                                                Wald chi2(7)       =     42.98
Scale parameter:                         1      Prob > chi2        =    0.0000

                               (standard errors adjusted for clustering on id)
------------------------------------------------------------------------------
             |             Semi-robust
        resp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0318552   .0062079    -5.13   0.000    -.0440225    -.019688
        xero |   .6211243   .4409799     1.41   0.159    -.2431803    1.485429
      female |   -.418182   .2367913    -1.77   0.077    -.8822843    .0459203
      cosine |  -.5680533   .1695157    -3.35   0.001    -.9002979   -.2358086
        sine |  -.1622827   .1461244    -1.11   0.267    -.4486812    .1241158
      height |  -.0475909   .0307661    -1.55   0.122    -.1078914    .0127096
     stunted |   .1492226   .4108152     0.36   0.716    -.6559605    .9544056
       _cons |  -2.421176   .1781838   -13.59   0.000     -2.77041   -2.071942
------------------------------------------------------------------------------

GEE with exponentiated coefficients (population averaged odds ratios)

xtgee resp age xero cosine sine female height stunted, i(id) corr(exch) l(logit) f(binom) robust eform

GEE population-averaged model                   Number of obs      =      1200
Group variable:                         id      Number of groups   =       275
Link:                                logit      Obs per group: min =         1
Family:                           binomial                     avg =       4.4
Correlation:                  exchangeable                     max =         6
                                                Wald chi2(7)       =     42.98
Scale parameter:                         1      Prob > chi2        =    0.0000

                               (standard errors adjusted for clustering on id)
------------------------------------------------------------------------------
             |             Semi-robust
        resp | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .9686468   .0060133    -5.13   0.000     .9569324    .9805046
        xero |   1.861019    .820672     1.41   0.159     .7841301     4.41686
      female |   .6582424    .155866    -1.77   0.077     .4138365    1.046991
      cosine |   .5666274   .0960522    -3.35   0.001     .4064486    .7899318
        sine |   .8502008   .1242351    -1.11   0.267     .6384696    1.132147
      height |   .9535238   .0293362    -1.55   0.122     .8977251    1.012791
     stunted |   1.160931   .4769283     0.36   0.716     .5189434    2.597126
------------------------------------------------------------------------------

Random intercept model logistic regression model for Table 9.1 using gllamm (iteration log not shown).

gllamm resp age xero cosine sine female height stunted, i(id) l(logit) f(binom) nip(12) adapt

number of level 1 units = 1200
number of level 2 units = 275
 
Condition Number = 75.756193
 
gllamm model
 
log likelihood = -334.64731
 
------------------------------------------------------------------------------
        resp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0339926   .0073209    -4.64   0.000    -.0483413   -.0196438
        xero |   .6243137   .4803223     1.30   0.194    -.3171007    1.565728
      female |  -.4363833   .2576521    -1.69   0.090    -.9413721    .0686055
      cosine |  -.5938483   .1741618    -3.41   0.001    -.9351991   -.2524975
        sine |  -.1648182   .1747285    -0.94   0.346    -.5072797    .1776434
      height |  -.0480044   .0267659    -1.79   0.073    -.1004646    .0044557
     stunted |    .202302   .4414687     0.46   0.647    -.6629608    1.067565
       _cons |  -2.673176   .2237101   -11.95   0.000     -3.11164   -2.234712
------------------------------------------------------------------------------
 
 
Variances and covariances of random effects
------------------------------------------------------------------------------

 
***level 2 (id)
 
    var(1): .6493042 (.34913418)
------------------------------------------------------------------------------

Exponentiated coefficients (subject-specific odds ratios) using gllamm

gllamm, eform

number of level 1 units = 1200
number of level 2 units = 275
 
Condition Number = 75.756193
 
gllamm model
 
log likelihood = -334.64731
 
------------------------------------------------------------------------------
        resp |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .9665787   .0070762    -4.64   0.000     .9528085    .9805479
        xero |   1.866964   .8967445     1.30   0.194     .7282574    4.786158
      female |   .6463699   .1665386    -1.69   0.090     .3900922    1.071014
      cosine |   .5521982   .0961718    -3.41   0.001     .3925077    .7768582
        sine |   .8480479   .1481781    -0.94   0.346     .6021313    1.194399
      height |   .9531296   .0255113    -1.79   0.073     .9044172    1.004466
     stunted |   1.224218   .5404538     0.46   0.647     .5153233    2.908288
------------------------------------------------------------------------------
 
 
Variances and covariances of random effects
------------------------------------------------------------------------------

 
***level 2 (id)
 
    var(1): .6493042 (.34913418)
------------------------------------------------------------------------------

Empirical Bayes predictions using gllapred (to be stored in ebm1)

gllapred eb, u

(means and standard deviations will be stored in ebm1 ebs1)
Non-adaptive log-likelihood: -334.64731
 -334.6473  -334.6473  
log-likelihood:-334.64731

Simulate three datasets using gllasim and obtain empirical Bayes predictions for Figure 9.2 (output not shown)

set seed 1322411
matrix aa=e(b)
gllasim y, from(aa)
gllamm y age xero cosine sine female height stunted, i(id) l(logit) f(binom) nip(12) from(aa) copy adapt
gllapred SIM1, u
drop y
gllasim y, from(aa)
gllamm y age xero cosine sine female height stunted, i(id) l(logit) f(binom)  nip(12) from(aa) copy adapt
gllapred SIM2, u
drop y
gllasim y, from(aa)
gllamm y age xero cosine sine female height stunted, i(id) l(logit) f(binom) nip(12) from(aa) copy adapt
gllapred SIM3, u
drop y


Make Figure 9.2 and store as ichs2.gph (note that simulated datasets are different due to different seed)

graph box ebm1 SIM1m1 SIM2m1 SIM3m1 if last==1, t2(" ") saving(ichs2, replace) /*
  */ legend(lab(1 "Data") lab(2 "Sim 1") lab(3 "Sim 2") lab(4 "Sim 3") ) /*
  */ marker(1, m(oh)) marker(2, m(oh)) marker(3, m(oh)) marker(4, m(oh)) lintensity(255)


Make Figure 9.3 and store as ichs3.gph

sort ebm1
gen rank = sum(last)
serrbar ebm1 ebs1 rank if last==1&mod(rank,6)==5, xlab(5 100 200 275)  /*
  */ ytitle("Predicted random intercept") saving(ichs3, replace)


Create small dataset with covariate values for which we want predictions for Figure 9.1

keep in 1/83
replace age = -32+_n-1
replace cosine=-1
replace sine= 0
replace xero=0
replace female=1
summ height
replace height=r(mean)
replace stunted=0
gen y=1


Obtain marginal and conditional probabilities for random intercept model using gllapred

gllapred mu_m, mu marg from(aa)
gen u01=0
gen u8p1 = 0.8
gen u8m1 = -0.8
gllapred mu_0, mu us(u0)
gllapred mu_8p, mu us(u8p)
gllapred mu_8m, mu us(u8m)


Obtain marginal probabilities for GEE by restoring the stored estimates using _estimates unhold and using Stata's predict command.

_estimates unhold gee
predict mu_g, mu
label variable mu_g "GEE"
label variable mu_0 "u=0"
label variable mu_8p "u=+0.8"
label variable mu_8m "u=-0.8"
label variable mu_m "pop. average"


Make Figure 9.1 (ichs.gph)

sort age
twoway (line mu_m age, clpatt(solid)) /*
  */ (scatter mu_g age if mod(age,2)==1, ms(oh) mc(black)) (line mu_0 age, clpatt(dash)) /*
  */ (line mu_8p age, clpatt(dash)) (line mu_8m age, clpatt(dash)), /*
  */ legend(order(1 2 3) label(3 "conditional") position(1) ring(0) ) /*
  */ ytitle("Probability of infection") /*
  */ xtitle("Age in months") saving(ichs, replace) 


References

Zeger, S. L. and Karim, M. R. (1991). Generalized linear models with random effects: A Gibbs sampling approach. Journal of the American Statistical Association 86, 79-86.

Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data. Oxford: Oxford University Press.

Rabe-Hesketh, S., Skrondal, A. and Pickles, A. (2002). Reliable estimation of generalised linear mixed models using adaptive quadrature. The Stata Journal 2, 1-21.

Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models. Boca Raton, FL: Chapman & Hall/ CRC Press.

Outline
Datasets and do-files