This phenomenon, known as "separation" (including complete and quasi-complete separation) will cause problems fitting the model. Sometimes the only symptom of separation will be extremely large standard errors, while at other times the software may report an error or a warning.

One approach to handling this sort of problem is exact logistic regression, which we discuss in section 4.1.2. But exact logistic regression is complex and may require prohibitive computational resources. Another option is to use a Bayesian approach. Here we show how to use a penalized likelihood method originally proposed by Firth (1993 Biometrika 80:27-38) and described fully in this setting by Georg Heinze (2002 Statistics in Medicine 21:2409-2419 and 2006 25:4216-4226). A nice summary of the method is shown on a web page that Heinze maintains. In later entries we'll consider the Bayesian and exact approaches.

**Update:**see bottom of the post.

**SAS**

In SAS, the corrected estimates can be found using the

`firth`option to the

`model`statement in

`proc logistic`. We'll set up the problem in the simple setting of a 2x2 table with an empty cell. Here, we simply output three observations with three combinations of predictor and outcome, along with a

`weight`variable which contains the case counts in each cell of the table

data testfirth;

pred=1; outcome=1; weight=20; output;

pred=0; outcome=1; weight=20; output;

pred=0; outcome=0; weight=200; output;

run;

In the

`proc logistic`code, we use the

`weight`statement, available in many procedures, to suggest how many times each observation is to be replicated before the analysis. This approach can save a lot of space.

proc logistic data = testfirth;

class outcome pred (param=ref ref='0');

model outcome(event='1') = pred / cl firth;

weight weight;

run;

Without the

`firth`option, the parameter estimate is 19.7 with a standard error of 1349. In contrast, here is the result of the above code.

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -2.2804 0.2324 96.2774 <.0001

pred 1 1 5.9939 1.4850 16.2926 <.0001

Note here that these no suggestion in this part of the output that the Firth method was employed. That appears only at the very top of the voluminous output.

**R**

In R, we can use Heinze's

`logistf`package, which includes the

`logistf()`function. We'll make the same table as in SAS by constructing two vectors of length 240 using the

`c()`and

`rep()`functions.

pred = c(rep(1,20),rep(0,220))

outcome = c(rep(1,40),rep(0,200))

lr1 = glm(outcome ~ pred, binomial)

>summary(lr1)

Call:

glm(formula = outcome ~ pred, family = binomial)

Deviance Residuals:

Min 1Q Median 3Q Max

-0.4366 -0.4366 -0.4366 -0.4366 2.1899

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -2.3026 0.2345 -9.818 <2e-16 ***

pred 20.8687 1458.5064 0.014 0.989

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 216.27 on 239 degrees of freedom

Residual deviance: 134.04 on 238 degrees of freedom

AIC: 138.04

Number of Fisher Scoring iterations: 17

Note that the estimate differs slightly from what SAS reports. Here's a more plausible answer.

>library(logistf)

>lr2 = logistf(outcome ~ pred)

>summary(lr2)

logistf(formula = outcome ~ pred)

Model fitted by Penalized ML

Confidence intervals and p-values by Profile Likelihood

coef se(coef) lower 0.95 upper 0.95 Chisq p

(Intercept) -2.280389 0.2324057 -2.765427 -1.851695 Inf 0

pred 5.993961 1.4850029 3.947048 10.852893 Inf 0

Likelihood ratio test=78.95473 on 1 df, p=0, n=240

Wald test = 16.29196 on 1 df, p = 5.429388e-05

Covariance-Matrix:

[,1] [,2]

[1,] 0.05401242 -0.05401242

[2,] -0.05401242 2.20523358

Here, the estimates are nearly identical to SAS, but the standard errors differ.

**Update:**

Georg Heinze, author of the

`logistf()`function, suggests the following two items.

First, in SAS, the

`model`statement option

`clparm=pl`will generate profile penalized likelihood confidence intervals, which should be similar to those from

`logistf()`, It certainly makes sense to use confidence limits that more closely reflect the fitting method.

Second, in R, there is a

`weight`option in both

`glm()`and in

`logistf()`that is similar to the

`weight`statement in SAS. For example, the data used above could have been input and run as:

pred = c(1,0,0)

outcome = c(1,1,0)

weight=c(20,20,200)

lr1 = glm(outcome ~ pred, binomial, weights=weight)

lr2 = logistf(outcome ~ pred, weights=weight)

## 2 comments:

I really like your blog. I'm curious as to why SAS and R produced dissimilar answers in the first case but were nearly identical using the Firth method. Do you know why?

Estimation of parameters in logistic regression is iterative. The two programs use different stopping rules (convergence criteria). While proc logistic monitors the first derivative of the log likelihood, R/glm uses a criterion based on the relative change in the deviance. Both programs falsely declare convergence, although the parameter estimates should in fact be infinite. Hope that helps, Georg

Post a Comment