The data set used in this section is cervix.dat and was taken from Carroll et al. (1993). Missing values of X are coded as -9.
The do-file is cervix.do.
The program used is gllamm. Use the command ssc describe gllamm and follow instructions to download gllamm. For more information on gllamm see http://www.gllamm.org.
Variables:D | dummy for case (cervical cancer) (1=case,0=control) |
X | true exposure to herpes (1=yes,0=no,-9=missing) |
W | measured exposure (1=yes,0=no) |
Read data
infile D X W using cervix.dat, clear
Estimates using true exposure only in validation sample (iteration log not shown):
logit D X if X~= -9 Logit estimates Number of obs = 115 LR chi2(1) = 2.95 Prob > chi2 = 0.0860 Log likelihood = -72.17832 Pseudo R2 = 0.0200 ------------------------------------------------------------------------------ D | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- X | .6813592 .3999444 1.70 0.088 -.1025174 1.465236 _cons | -1.011601 .2919371 -3.47 0.001 -1.583787 -.4394147 ------------------------------------------------------------------------------
Estimates using imperfect measure of exposure:
logit D W Logit estimates Number of obs = 2044 LR chi2(1) = 23.93 Prob > chi2 = 0.0000 Log likelihood = -1321.3944 Pseudo R2 = 0.0090 ------------------------------------------------------------------------------ D | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- W | .4528744 .0928123 4.88 0.000 .2709657 .6347831 _cons | -.802962 .0656468 -12.23 0.000 -.9316275 -.6742966 ------------------------------------------------------------------------------
Estimate joint model using gllamm for Table 14.3
Collapse and reshape data:
gen cons=1 collapse (sum) wt2=cons, by(D X W) gen patt=_n list patt D X W wt2, clean patt D X W wt2 1. 1 0 -9 0 701 2. 2 0 -9 1 535 3. 3 0 0 0 33 4. 4 0 0 1 11 5. 5 0 1 0 16 6. 6 0 1 1 16 7. 7 1 -9 0 318 8. 8 1 -9 1 375 9. 9 1 0 0 13 10. 10 1 0 1 3 11. 11 1 1 0 5 12. 12 1 1 1 18
Stack responses X,W,D into variable y and create a key to which response is of which type, var=1,2,3, respectively:
gen y1 = X gen y2 = W gen y3 = D reshape long y, i(patt) j(var) (note: j = 1 2 3) Data wide -> long ----------------------------------------------------------------------------- Number of obs. 12 -> 36 Number of variables 8 -> 7 j variable (3 values) -> var xij variables: y1 y2 y3 -> y ----------------------------------------------------------------------------- drop if y==-9
Create dummies for the three response types:
tab var, gen(d)
Dummy v for being in the validation sample and nv for not being in the validation sample:
gen v = X> -9 gen nv = 1-v
Interactions between variables and dummies for fixed part:
gen v_d1 = v*d1 gen X_v_d2 = X*v*d2 gen D_v_d2 = D*v*d2 gen D_nv_d2 = D*nv*d2 gen X_D_v_d2 = X*D*v*d2 gen X_v_d3 = X*v*d3
Interactions between variables and dummies for random part:
gen nv_d2 = nv*d2 * already defined: gen D_nv_d2 = D*nv*d2 gen nv_d3 = nv*d3 cons def 1 [z2_1_1]nv_d2=1 cons def 2 [z2_1_2]nv_d2=0 cons def 3 [pat1_1l]nv_d2 = [y]X_v_d2 cons def 4 [pat1_1l]D_nv_d2=[y]X_D_v_d2 cons def 5 [pat1_1l]nv_d3=[y]X_v_d3 cons def 6 [y]d1=[p2_1]_cons
Non-differential measurement error (a2=a3=0) for Table 14.3:
eq load: nv_d2 nv_d3 gllamm y d1 d2 X_v_d2 d3 X_v_d3, i(patt) l(logit) f(binom) nocons nip(2) /* */ ip(fn) eqs(load) constr(1 2 3 5 6) frload(1) weight(wt) number of level 1 units = 4203 number of level 2 units = 2044 Condition Number = 13.474097 gllamm model with constraints: ( 1) [z2_1_1]nv_d2 = 1 ( 2) [z2_1_2]nv_d2 = 0 ( 3) - [y]X_v_d2 + [pat1_1l]nv_d2 = 0 ( 4) - [y]X_v_d3 + [pat1_1l]nv_d3 = 0 ( 5) [y]d1 - [p2_1]_cons = 0 log likelihood = -2805.194770938573 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- d1 | .0062648 .1647347 0.04 0.970 -.3166092 .3291388 d2 | -1.062541 .2244703 -4.73 0.000 -1.502495 -.6225876 X_v_d2 | 1.812779 .3653273 4.96 0.000 1.096751 2.528807 d3 | -1.097045 .1487521 -7.37 0.000 -1.388593 -.8054958 X_v_d3 | .9579213 .2365887 4.05 0.000 .494216 1.421627 ------------------------------------------------------------------------------ Probabilities and locations of random effects ------------------------------------------------------------------------------ ***level 2 (patt) loc1: 1, 0 var(1): .24999755 loadings for random effect 1 nv_d2: 1.8127791 (.36532727) nv_d3: .95792129 (.23658867) prob: 0.5016, 0.4984 ------------------------------------------------------------------------------
Response probabilities:
gllapred mu, mu
Probabilities of W for given X and D (Table 14.4):
list X D mu if v==1&d2==1, clean X D mu 6. 0 0 .25682409 9. 0 0 .25682409 12. 1 0 .67923049 15. 1 0 .67923049 22. 0 1 .25682409 25. 0 1 .25682409 28. 1 1 .67923049 31. 1 1 .67923049
Posterior probabilities of true exposure X (Table 14.5):
gllapred p, p sort patt list W D p1 p2 if nv==1, clean W D p1 p2 1. 0 0 .23651644 .76348356 2. 0 0 .23651644 .76348356 3. 1 0 .65495805 .34504195 4. 1 0 .65495805 .34504195 17. 0 1 .44671496 .55328504 18. 0 1 .44671496 .55328504 19. 1 1 .83185431 .16814569 20. 1 1 .83185431 .16814569 drop p1 p2 mu
Differential measurement error for Table 14.3:
eq load: nv_d2 D_nv_d2 nv_d3 gllamm y d1 d2 X_v_d2 D_v_d2 D_nv_d2 X_D_v_d2 d3 X_v_d3, i(patt) /* */ l(logit) f(binom) nocons nip(2) /* */ ip(fn) eqs(load) constr(1 2 3 4 5 6) frload(1) weight(wt) number of level 1 units = 4203 number of level 2 units = 2044 Condition Number = 37.207216 gllamm model with constraints: ( 1) [z2_1_1]nv_d2 = 1 ( 2) [z2_1_2]nv_d2 = 0 ( 3) - [y]X_v_d2 + [pat1_1l]nv_d2 = 0 ( 4) - [y]X_D_v_d2 + [pat1_1l]D_nv_d2 = 0 ( 5) - [y]X_v_d3 + [pat1_1l]nv_d3 = 0 ( 6) [y]d1 - [p2_1]_cons = 0 log likelihood = -2802.626223555175 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- d1 | -.0242144 .1826211 -0.13 0.895 -.3821452 .3337164 d2 | -.7911943 .2586129 -3.06 0.002 -1.298066 -.2843224 X_v_d2 | 1.0986 .4962029 2.21 0.027 .1260605 2.07114 D_v_d2 | -.6751662 .6907708 -0.98 0.328 -2.029052 .6787197 D_nv_d2 | -.6587914 .6424359 -1.03 0.305 -1.917943 .6003598 X_D_v_d2 | 1.648709 .9550386 1.73 0.084 -.2231321 3.52055 d3 | -.8937903 .2212583 -4.04 0.000 -1.327449 -.460132 X_v_d3 | .6020423 .3966732 1.52 0.129 -.1754229 1.379508 ------------------------------------------------------------------------------ Probabilities and locations of random effects ------------------------------------------------------------------------------ ***level 2 (patt) loc1: 1, 0 var(1): .24996336 loadings for random effect 1 nv_d2: 1.0986002 (.49620285) D_nv_d2: 1.6487091 (.95503858) nv_d3: .60204233 (.39667322) prob: 0.4939, 0.5061 ------------------------------------------------------------------------------
Response probabilities:
gllapred mu, mu
Probabilities of W for given X and D (Table 14.4):
list X D mu if v==1&d2==1, clean X D mu 6. 0 0 .31191228 9. 0 0 .31191228 12. 1 0 .57625194 15. 1 0 .57625194 22. 0 1 .18749643 25. 0 1 .18749643 28. 1 1 .78261125 31. 1 1 .78261125
Posterior probabilities of true exposure X (Table 14.5):
gllapred p, p sort patt list W D p1 p2 if nv==1, clean W D p1 p2 1. 0 0 .32653164 .67346836 2. 0 0 .32653164 .67346836 3. 1 0 .5925908 .4074092 4. 1 0 .5925908 .4074092 17. 0 1 .27582333 .72417667 18. 0 1 .27582333 .72417667 19. 1 1 .85594791 .14405209 20. 1 1 .85594791 .14405209
Carroll, R. J., Gail, M. H. and Lubin, J. H. (1993). Case-control studies with errors in covariates. Journal of the American Statistical Association 88, 185-199.
Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized
Latent Variable Modeling: Multilevel, Longitudinal and Structural
Equation Models. Boca Raton, FL: Chapman & Hall/ CRC Press.
Outline
Datasets and do-files