The data set used in this section is blood.dta
The do-file is blood.do.
The program used is gllamm. Use the command ssc describe gllamm and follow instructions to download gllamm. For more information on gllamm see http://www.gllamm.org.
Read data
use blood, clear list, clean y lnN A O B cA cAB cO cB ind clus id 1. 179 6.045005 2 0 0 1 0 0 0 1 1 1 2. 6 6.045005 1 1 0 1 0 0 0 2 1 1 3. 202 6.045005 1 0 1 0 1 0 0 3 1 1 4. 35 6.045005 1 1 0 1 0 0 0 4 1 1 5. 0 6.045005 0 2 0 0 0 1 0 0 1 1 6. 0 6.045005 0 1 1 0 0 0 1 0 1 1 7. 0 6.045005 1 0 1 0 1 0 0 0 1 1 8. 0 6.045005 0 1 1 0 0 0 1 0 1 1 9. 0 6.045005 0 0 2 0 0 0 1 0 1 1
The expected frequency for blood type A is given by the composite link
exp(lnN + 2a) + exp(lnN + a + o) + exp(lnN + o + a),
where a = ln(p) is the log relative frequency of gene A and o = ln(r) is the log relative frequency of gene O. The first term corresponds to genes (from mother and father, respectively) AA, the second to AO and the third to OA.
Estimate the model using gllamm with a composite link. The compos() option specifies:
- the grouping variable clus such that linear predictors within a group can be combined with each other to form a composite link for an observation within the group,
- the variable ind indicating to which response the composite links defined by the subsequent weight variables correspond: composite link based on weight variable 1 goes to where ind equals 1, etc.,
- the weight variables cA cAB cO cB for the four composite links:
gllamm y A O B, i(id) offset(lnN) fam(poiss) compos(clus ind cA cAB cO cB) init nocons eform number of level 1 units = 9 Condition Number = 4.918322 gllamm model log likelihood = -13.200915 ------------------------------------------------------------------------------ y | exp(b) Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- A | .25156 .0172837 -20.09 0.000 .2198665 .2878221 O | .6984284 .0240529 -10.42 0.000 .6528414 .7471986 B | .0500116 .0077113 -19.43 0.000 .0369679 .0676577 lnN | (offset) ------------------------------------------------------------------------------
The coefficients are the estimates of the gene frequencies of A (0.25), O (0.90) and B (0.05).
Obtain predicted counts
gllapred mu, mu list y mu, clean y mu 1. 179 174.99317 2. 6 10.618296 3. 202 205.85252 4. 35 30.536004 5. 0 . 6. 0 . 7. 0 . 8. 0 . 9. 0 .
Perform goodness-of-fit test by estimating saturated model
* store current log-likelihood local ll=e(ll) * dummies for the four counts: gen n=_n if y>0 quietly tab n, gen(dummy) * estimate saturated model using poisson quietly poisson y dummy*, nocons * compare log-likelihoods local deviance = 2*(e(ll)-`ll') local pval = chiprob(`deviance',1) disp "deviance = " `deviance' ", p-value = " `pval' deviance = 3.1732899, p-value = .82544424