Showing posts with label proc mcmc. Show all posts
Showing posts with label proc mcmc. Show all posts

Tuesday, November 8, 2011

Example 9.13: Negative binomial regression with proc mcmc


In practice, data that derive from counts rarely seem to be fit well by a Poisson model; one more flexible alternative is a negative binomial model. In this SAS-only entry, we discuss how proc mcmc can be used for estimation. An overview of support for Bayesian methods in R can be found in the Bayesian Task View.

SAS

As noted in example 8.30, the SAS rand function lacks the option to input the mean directly, instead using the basic parameters of the probability of success and the number of successes k. (Though note the negative binomial has several formulations, which can cause problems when using multiple software systems.) As developed in that example, we use the the proc fcmp function to instead work with the mean.

proc fcmp outlib=sasuser.funcs.test;
function poismean_nb(mean, size);
return(size/(mean+size));
endsub;
run;

options cmplib=sasuser.funcs;
run;

With that preparation out of the way, we simulate some data--here an intercept of 0 and a slope of 1.

data test;
do i = 1 to 10000;
x = normal(0);
mu = exp(0 + x);
k = 2;
y = rand("NEGBINOMIAL", poismean_nb(mu, k),k);
output;
end;
run;

The proc mcmc code presents a slight difficulty: the k successes before the random number of failures ought to be an integer, and proc mcmc appears to lack an integer-valued distribution. The model will run with continuous values of k, but its behavior is strange. Instead, we put a prior on a new parameter, kstar and take k as the rounded value (section 1.8.4) of kstar; since the values must be > 0, we also add 1 to the rounded value.

proc mcmc data=test nmc=1000 thin=1 seed=10061966;
parms beta0 1 beta1 1 kstar 10;

prior b: ~ normal(0, var = 10000);
prior kstar ~ igamma(.01, scale=0.01);

k=round(kstar+1, 1);
mu = exp(beta0 + beta1 * x);

model y ~ negbin(k, poismean_nb(mu, k));
run;

The way the kstar and k business works is that SAS actually processes the programming statements in each iteration of the chain. Posterior summaries just below, sample diagnostic plot above.

Posterior Summaries

Parameter N Mean Standard Percentiles
Deviation 25% 50% 75%
beta0 10000 0.00712 0.0131 -0.00171 0.00721 0.0156
beta1 10000 0.9818 0.0128 0.9732 0.9814 0.9905
kstar 10000 0.9648 0.2855 0.7112 0.9481 1.1974

Posterior Intervals
Parameter Alpha Equal-Tail Interval HPD Interval
beta0 0.050 -0.0195 0.0321 -0.0182 0.0328
beta1 0.050 0.9569 1.0074 0.9562 1.0063
kstar 0.050 0.5208 1.4709 0.5001 1.4348

If a simple model like the one shown here is all you need, proc genmod's bayes statement can work for you. But the formulation demonstrated above would be useful for a generalized linear mixed model, for example.

Tuesday, October 4, 2011

Example 9.8: New stuff in SAS 9.3-- Bayesian random effects models in Proc MCMC




Rounding off our reports on major new developments in SAS 9.3, today we'll talk about proc mcmc and the random statement.

Stand-alone packages for fitting very general Bayesian models using Markov chain Monte Carlo (MCMC) methods have been available for quite some time now. The best known of these are BUGS and its derivatives WinBUGS (last updated in 2007) and OpenBUGS . There are also some packages available that call these tools from R.

Today we'll consider a relatively simple model: Clustered Poisson data where cluster means are a constant plus a cluster-specific exponentially-distributed random effect. To be clear:
y_ij ~ Poisson(mu_i)
log(mu_i) = B_0 + r_i
r_i ~ Exponential(lambda)
Of course in Bayesian thinking all effects are random-- here we use the term in the sense of cluster-specific effects.

SAS
Several SAS procedures have a bayes statement that allow some specific models to be fit. For example, in Section 6.6 and example 8.17, we show Bayesian Poisson and logistic regression, respectively, using proc genmod. But our example today is a little unusual, and we could not find a canned procedure for it. For these more general problems, SAS has proc mcmc, which in SAS 9.3 allows random effects to be easily modeled.

We begin by generating the data, and fitting the naive (unclustered) model. We set B_0 = 1 and lambda = 0.4. There are 200 clusters of 10 observations each, which we might imagine represent 10 students from each of 200 classrooms.

data test2;
truebeta0 = 1;
randscale = .4;
call streaminit(1944);
do i = 1 to 200;
randint = rand("EXPONENTIAL") * randscale;
do ni = 1 to 10;
mu = exp(truebeta0 + randint);
y = rand("POISSON", mu);
output;
end;
end;
run;

proc genmod data = test2;
model y = / dist=poisson;
run;

Standard Wald 95%
Parameter Estimate Error Confidence Limits

Intercept 1.4983 0.0106 1.4776 1.5190

Note the inelegant SAS syntax for fitting an intercept-only model. The result is pretty awful-- 50% bias with respect to the global mean. Perhaps we'll do better by acknowledging the clustering. We might try that with normally distributed random effects in proc glimmix.

proc glimmix data = test2 method=laplace;
class i;
model y = / dist = poisson solution;
random int / subject = i type = un;
run;

Cov Standard
Parm Subject Estimate Error
UN(1,1) i 0.1682 0.01841

Standard
Effect Estimate Error t Value Pr > |t|
Intercept 1.3805 0.03124 44.20 <.0001

No joy-- still a 40% bias in the estimated mean. And the variance of the random effects is biased by more than 50%! Let's try fitting the model that generated the data.

proc mcmc data=test2 nmc=10000 thin=10 seed=2011;
parms fixedint 1 gscale 0.4;

prior fixedint ~ normal(0, var = 10000);
prior gscale ~ igamma(.01 , scale = .01 ) ;

random rint ~ gamma(shape=1, scale=gscale) subject = i initial=0.0001;
mu = exp(fixedint + rint);
model y ~ poisson(mu);
run;

The key points of the proc mcmc statement are nmc, the total number of Monte Carlo iterations to perform, and thin, which includes only every nth sample for inference. The prior and model statements are fairly obvious; we note that in more complex models, parameters that are listed within a single prior statement are sampled as a block. We're placing priors on the fixed (shared) intercept and the scale of the exponential. The mu line is actually just a programming statement-- it uses the same syntax as data step programming.
The newly available statement is random. The syntax here is similar to those for the other priors, with the addition of the subject option, which generates a unique parameter for each level of the subject variable. The random effects themselves can be used in later statements, as shown, to enter into data distributions. A final note here is that the exponential distribution isn't explicitly available, but since the gamma distribution with shape fixed at 1 defines the exponential, this is not a problem. Here are the key results.

Posterior Summaries

Standard
Parameter N Mean Deviation
fixedint 1000 1.0346 0.0244
gscale 1000 0.3541 0.0314

Posterior Intervals

Parameter Alpha HPD Interval
fixedint 0.050 0.9834 1.0791
gscale 0.050 0.2937 0.4163

The 95% HPD regions include the true values of the parameters and the posterior means are much less biased than in the model assuming normal random effects.

As usual, MCMC models should be evaluated carefully for convergence and coverage. In this example, I have some concerns (see default diagnostic figure above) and if it were real data I would want to do more.

R
The CRAN task view on Bayesian Inference includes a summary of tools for general and model-specific MCMC tools. However, there is nothing like proc mcmc in terms of being a general and easy to use tool that is native to R. The nearest options are to use R front ends to WinBUGS/OpenBUGS (R2WinBUGS) or JAGS (rjags). (A brief worked example of using rjags was posted last year by John Myles White.) Alternatively, with some math and a little sweat, the mcmc package would also work. We'll explore an approach through one or more of these packages in a later entry, and would welcome a collaboration from anyone who would like to take that on.