Friday, March 5, 2010

Example 7.25: compare draws with distribution

In example 7.24, we demonstrated a Metropolis-Hastings algorithm for generating observations from awkward distributions. In such settings it is desirable to assess the quality of draws by comparing them with the target distribution.

Recall that the distribution function is


f(y) = c e^(-y^4)(1+|y|)^3


The constant c was not needed to generate draws, but is required for calculation of the probability density function. We can find the normalizing constant c using symbolic mathematics software (try running it yourself using Wolfram Alpha). This yields a result .25 + (.75 * Pi^(.5) + Gamma(5/4) + Gamma(7/4) for the integral over the positive real line, which when doubled gives a value of c=6.809610784.



SAS

To compare the distribution of the variates with the truth, we first generate a density estimate using proc kde (sections 2.6.4, 5.1.16), saving the density estimates in a new data set.


ods select none;
proc kde data=mh;
univar x / out=mhepdf;
run;
ods exclude none;


Then we generate the true values, using the constant calculated above.


data mh2;
set mhepdf;
true = 6.809610784**(-1) *
exp(-value**4) * (1 + abs(value))**3;
run;


Finally, we can plot the estimated and true values together. We use the legend statement (section 5.2.14) to approximate the R legend created below.


legend1 position=(bottom center inside)
across=1 noframe label=none value=(h=1.5);
axis1 label=(angle=90 "Density");
symbol1 i=j l=1 w=3 c=black;
symbol2 i=j l=2 w=3 c=black;
proc gplot data = mh2;
plot (density true)*value / overlay vaxis = axis1 legend=legend1;
label value="x" density="Simulated" true="True";
run;


R

In R, we begin by defining a function to calculate f(y)


pdfeval = function(x) {
return(1/6.809610784*exp(-x^4)*(1+abs(x))^3)
}


The curve() function in R is useful for plotting values of functions over a range. We use it here to plot f(y). Then we use the density() function (section 5.1.16) to estimate the density function from the variates. Rather than save this information in an object, we use it as input to the lines function (section 5.2.1).


curve(pdfeval, from=-2, to=2, lwd=2, lty=2, type="l",
ylab="probability", xlab="Y")
lines(density(res), lwd=2, lty=1)
legend(-1, .1, legend=c("True", "Simulated"),
lty=2:1, lwd=2, cex=1.8, bty="n")


Results from both systems are plotted below.




We note that in each plot, the sampled variates appear to miss the distribution function near 0. This is an artifact of the default smoothing used by the density estimators, and could be tuned away.

No comments: