R
ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
library(MASS)
glmmod = glm(homeless ~ age + female, binomial, data=ds)
> summary(glmmod)
Call:
glm(formula = homeless ~ age + female, family = binomial, data = ds)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3600 -1.1231 -0.9185 1.2020 1.5466
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.89262 0.45366 -1.968 0.0491 *
age 0.02386 0.01242 1.921 0.0548 .
female -0.49198 0.22822 -2.156 0.0311 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 625.28 on 452 degrees of freedom
Residual deviance: 617.19 on 450 degrees of freedom
AIC: 623.19
Number of Fisher Scoring iterations: 4
> exp(confint(glmmod))
Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) 0.1669932 0.9920023
age 0.9996431 1.0496390
female 0.3885283 0.9522567
> library(MASS)
> exp(confint.default(glmmod))
2.5 % 97.5 %
(Intercept) 0.1683396 0.9965331
age 0.9995114 1.0493877
female 0.3909104 0.9563045
Why are they different? Which one is correct?
SAS
Fortunately the detailed documentation in SAS can help resolve this. The logistic procedure (section 4.1.1) offers the clodds option to the model statement. Setting this option to both produces two sets of CL, based on the Wald test and on the profile-likelihood approach. (Venzon, D. J. and Moolgavkar, S. H. (1988), “A Method for Computing Profile-Likelihood Based Confidence Intervals,” Applied Statistics, 37, 87–94.)
ods output cloddswald = waldcl cloddspl = plcl;
proc logistic data = "c:\book\help.sas7bdat" plots=none;
class female (param=ref ref='0');
model homeless(event='1') = age female / clodds = both;
run;
Odds Ratio Estimates and Profile-Likelihood Confidence Intervals
Effect Unit Estimate 95% Confidence Limits
AGE 1.0000 1.024 1.000 1.050
FEMALE 1 vs 0 1.0000 0.611 0.389 0.952
Odds Ratio Estimates and Wald Confidence Intervals
Effect Unit Estimate 95% Confidence Limits
AGE 1.0000 1.024 1.000 1.049
FEMALE 1 vs 0 1.0000 0.611 0.391 0.956
Unfortunately, the default precision of the printout isn't quite sufficient to identify whether this distinction aligns with the differences seen in the two R methods. We get around this by using the ODS system to save the output as data sets (section A.7.1). Then we can print the data sets, removing the default rounding formats to find all of the available precision.
title "Wald CL";
proc print data=waldcl; format _all_; run;
title "PL CL";
proc print data=plcl; format _all_; run;
Wald CL
Odds
Obs Effect Unit RatioEst LowerCL UpperCL
1 AGE 1 1.02415 0.99951 1.04939
2 FEMALE 1 vs 0 1 0.61143 0.39092 0.95633
PL CL
Odds
Obs Effect Unit RatioEst LowerCL UpperCL
1 AGE 1 1.02415 0.99964 1.04964
2 FEMALE 1 vs 0 1 0.61143 0.38853 0.95226
With this added precision, we can see that the confint.default() function in the MASS library generates the Wald confidence limits, while the confint() function produces the profile-likelihood limits. This also explains the confint() comment "Waiting for profiling to be done..." Thus neither CI from the MASS library is incorrect, though the profile-likelihood method is thought to be superior, especially for small sample sizes. Little practical difference is seen here.
I believe that confint.default() is actually in the 'stats' package, while confint.glm() is nominally in the 'stats' package, but that version is just a wrapper for confint.glm() in the MASS package.
ReplyDeleteI really appreciate this post. The discrepancy between R and SAS CI's has been eluding me for weeks. Thanks a million.
ReplyDelete