SAS and R: Example 9.13: Negative binomial regression with proc mcmc

Tuesday, November 8, 2011

Example 9.13: Negative binomial regression with proc mcmc

In practice, data that derive from counts rarely seem to be fit well by a Poisson model; one more flexible alternative is a negative binomial model. In this SAS-only entry, we discuss how proc mcmc can be used for estimation. An overview of support for Bayesian methods in R can be found in the Bayesian Task View.

SAS

As noted in example 8.30, the SAS rand function lacks the option to input the mean directly, instead using the basic parameters of the probability of success and the number of successes k. (Though note the negative binomial has several formulations, which can cause problems when using multiple software systems.) As developed in that example, we use the the proc fcmp function to instead work with the mean.


proc fcmp outlib=sasuser.funcs.test;
function poismean_nb(mean, size);
  return(size/(mean+size));
  endsub;
run;

options cmplib=sasuser.funcs;
run;

With that preparation out of the way, we simulate some data--here an intercept of 0 and a slope of 1.


data test;
  do i = 1 to 10000;
    x = normal(0);
 mu = exp(0 + x);
 k = 2;
 y = rand("NEGBINOMIAL", poismean_nb(mu, k),k);
 output;
 end;
run;

The proc mcmc code presents a slight difficulty: the k successes before the random number of failures ought to be an integer, and proc mcmc appears to lack an integer-valued distribution. The model will run with continuous values of k, but its behavior is strange. Instead, we put a prior on a new parameter, kstar and take k as the rounded value (section 1.8.4) of kstar; since the values must be > 0, we also add 1 to the rounded value.


proc mcmc data=test nmc=1000 thin=1 seed=10061966;
parms beta0 1 beta1 1 kstar 10;

prior b: ~ normal(0, var = 10000);
prior kstar ~ igamma(.01, scale=0.01);

k=round(kstar+1, 1);
mu = exp(beta0 + beta1 * x);

model y ~ negbin(k, poismean_nb(mu, k));
run;

The way the kstar and k business works is that SAS actually processes the programming statements in each iteration of the chain. Posterior summaries just below, sample diagnostic plot above.


                       Posterior Summaries

Parameter        N  Mean  Standard              Percentiles
                          Deviation         25%      50%     75%
beta0        10000 0.00712 0.0131        -0.00171  0.00721 0.0156
beta1        10000 0.9818  0.0128         0.9732   0.9814  0.9905
kstar        10000 0.9648  0.2855         0.7112   0.9481  1.1974

                       Posterior Intervals
Parameter Alpha Equal-Tail Interval   HPD Interval
beta0    0.050 -0.0195 0.0321       -0.0182 0.0328
beta1    0.050  0.9569 1.0074        0.9562 1.0063
kstar    0.050  0.5208 1.4709        0.5001 1.4348

If a simple model like the one shown here is all you need, proc genmod's bayes statement can work for you. But the formulation demonstrated above would be useful for a generalized linear mixed model, for example.

2 comments:

Anonymous said...: And why would one go through these contortions when we have Stata??? ;) No, seriously, this is like the old days when we did conditional logistic regression by tricking PROC PHREG in SAS.; November 15, 2011 at 12:49 PM
Joshua Wiley said...: @Constantine Do you have a suggestion for how to easily fit general Bayesian models in Stata? I would love to hear it if you do. For example, I would like to fit a cumulative logit model with cross classified random effects. The priors on the random effect covariances can be reasonably tight around 0, though they should not be constrained.; September 7, 2012 at 7:30 PM

Post a Comment

Reviews (from the first edition)

"By placing the R and SAS solutions together and by covering a vast array of tasks in one book, Kleinman and Horton have added surprising value and searchability to the information in their book. … a home run, and it is a book I am grateful to have sitting, dust-free, on my shelf."
—Robert Alan Greevy, Jr, Teaching of Statistics in the Health Sciences

"I use SAS and R on a daily basis. Each has strengths and weaknesses, and using both of them gives the advantage of being able to do almost anything when it comes to data manipulation, analysis, and graphics. If you use both SAS and R on a regular basis, get this book. If you know one of the packages and are learning the other, you may need more than this book, but get this book, too. "

Charles Heckler, University of Rochester, Technometrics

"Excellent cross-referencing to other topics and end-of-chapter worked examples on the ‘Health evaluation and linkage to primary care’ data set are given with each topic. … users who are proficient in either of the software packages but with the need to use the other will find this book useful."
—Frances Denny, Journal of the Royal Statistical Society, Series A

About the authors

Nicholas Horton is a Professor of Statistics at Amherst College. He is a biostatistician with expertise in missing data methods, longitudinal regression, statistical computing and statistical education. Nick's home page; Nick's Google Scholar author page

Ken Kleinman is an Associate Professor with the Department of Biostatistics and Epidemiology at the University of Massachusetts, Amherst. He is a consulting biostatistician with expertise in group-randomized trials and disease surveillance; he also offers R training courses. Ken's home page; Ken's Google Scholar author page.

SAS and R

Catalogs of posts

Tuesday, November 8, 2011

Example 9.13: Negative binomial regression with proc mcmc

2 comments:

About SAS and R

Topics discussed