SAS and R: Example 8.16: Exact logistic regression

Tuesday, November 30, 2010

Example 8.16: Exact logistic regression

In example 8.15, on Firth logistic regression, we mentioned alternative approaches to separation troubles. Here we demonstrate exact logistic regression. The code for this appears in the book (section 4.1.2) but we don't show an example of it there. We'll consider the setting of observing 100 subjects each with x=0 and x=1, observing no events when x=0, and 5 when x=1.

SAS
We'll create the data as a summary, rather than for every line of data. Then we can use the "events/trials" syntax (section 4.1.1) that both proc logistic and proc genmod accept. This is another way to reduce the size of data sets (along with the weight option mentioned previously) but is less generally useful. The the exact statement in proc logistic will fit the exact logistic regression and generate a p-value. The estimate option is required to display estimated log odds ratio.


data exact;
  x=0; count=0; n=100; output;
  x=1; count=5; n=100; output;
run;

proc logistic data=exact;
  model count/n = x;
  exact x / estimate;
run;

This generates the following output:


                   Exact Parameter Estimates

                        Standard      95% Confidence
Parameter   Estimate       Error         Limits          p-Value

x             1.9414*          .    -0.0677   Infinity    0.0594

         NOTE: * indicates a median unbiased estimate.

R
In R we use the elrm() function in the elrm package to approximate exact logistic regression, as described in this paper by the package's authors. The function requires a special formula object with syntax identical to the SAS events/trials syntax. (Note that the function does not behave as expected when identical observations with trials=1 are submitted. Thus data should be collapsed into unique combinations of predictors before using the function.) In addition, it requires its data to be included in a data frame. We'll construct the data frame in one function call to data.frame().


elrmdata = data.frame(count=c(0,5), x=c(0,1), n=c(100,100))
library(elrm)
resexact = elrm(count/n ~ x, interest = ~x, iter=22000, 
  burnIn=2000, data=elrmdata, r=2)
summary(resexact)

producing the following result:


Call:
[[1]]
elrm(formula = count/n ~ x, interest = ~x, r = 2, iter = 22000, 
    dataset = elrmdata, burnIn = 2000)

Results:
  estimate p-value p-value_se mc_size
x   2.0225 0.02635     0.0011   20000

95% Confidence Intervals for Parameters

        lower upper
x -0.02065572   Inf

Differences between the SAS and R results most likely arise from the fact that the elrm() function is an approximation of the exact approach. The upper limit of infinity seen in the exact SAS analysis and approximate exact elrm() analysis reveals a limitation of this approach relative to the Firth approach seen in example 8.15 and the Bayesian approach we'll examine later.

A final note: if the true Pr(Y=1|X=1) = 0.05, then the true Pr(Y=1|X=0) that results in a log odds ratio of 1.94 is about 0.0075; for a log odds ratio of 2.02, the true probability is about 0.0069.

3 comments:

Anonymous said...: Thank you very much for pointing out that data need to be collapsed for elrm to run. That solved the problem I was having running my model. You made my day : ); February 27, 2011 at 3:34 PM
Ken Kleinman said...: You're welcome! It took a bunch of messing around to figure it out myself, and I'm glad you were able to benefit from it.; February 28, 2011 at 10:24 AM
Laura said...: Very good blog. But I want to ask where is the article that you mentionned at the begining. Than you.; September 23, 2014 at 2:17 PM

Post a Comment

Reviews (from the first edition)

"By placing the R and SAS solutions together and by covering a vast array of tasks in one book, Kleinman and Horton have added surprising value and searchability to the information in their book. … a home run, and it is a book I am grateful to have sitting, dust-free, on my shelf."
—Robert Alan Greevy, Jr, Teaching of Statistics in the Health Sciences

"I use SAS and R on a daily basis. Each has strengths and weaknesses, and using both of them gives the advantage of being able to do almost anything when it comes to data manipulation, analysis, and graphics. If you use both SAS and R on a regular basis, get this book. If you know one of the packages and are learning the other, you may need more than this book, but get this book, too. "

Charles Heckler, University of Rochester, Technometrics

"Excellent cross-referencing to other topics and end-of-chapter worked examples on the ‘Health evaluation and linkage to primary care’ data set are given with each topic. … users who are proficient in either of the software packages but with the need to use the other will find this book useful."
—Frances Denny, Journal of the Royal Statistical Society, Series A

About the authors

Nicholas Horton is a Professor of Statistics at Amherst College. He is a biostatistician with expertise in missing data methods, longitudinal regression, statistical computing and statistical education. Nick's home page; Nick's Google Scholar author page

Ken Kleinman is an Associate Professor with the Department of Biostatistics and Epidemiology at the University of Massachusetts, Amherst. He is a consulting biostatistician with expertise in group-randomized trials and disease surveillance; he also offers R training courses. Ken's home page; Ken's Google Scholar author page.

SAS and R

Catalogs of posts

Tuesday, November 30, 2010

Example 8.16: Exact logistic regression

3 comments:

About SAS and R

Topics discussed