Wednesday, June 24, 2009

Example 7.3: Simple jittered scatterplot with smoother for dichotomous outcomes with continuous predictors

It's useful to look at scatterplots even when the "y" variable is dichotomous. For example, this can help determine whether categorization or linear assumptions would be more plausible. However, an unmodified scatterplot is less than helpful, since all of the "y" values are either 0 or 1, and are hard to separate visually. Some jittering (section 5.2.4) is useful in that regard. In addition, it is often useful to plot a smoothed line through the data. We use the data generated in section 7.2 to demonstrate.


SAS
In SAS, we add jitter, then plot the jittered values and the observed values on the same plot using the overlay option. We display the jittered values as dots and add a smoothed line through the real (not jittered) data without displaying their values using symbol statements (sections 5.2.2, 5.2.6).

data ds2;
set test;
yplot = ytest + uniform(0) * .2;
run;

symbol1 i = sm50s v = none c = black;
symbol2 i = none v = dot c = black;
proc gplot data = ds2;
plot (ytest yplot) * xtest / overlay;
run;
And the resulting plot is:




















R
In R, we display a scatterplot (section 5.1.1) of the jittered values against the covariate. The jitter() function (section 5.2.4) is called within the plot() function. We then add the smoothed line, based on the real (not jittered) data using the lines() function (section 5.2.1), called with the appropriate lowess() (section 5.2.6) object as input.

plot(xtest,jitter(ytest))
lines(lowess(xtest,ytest))

And the resulting plot is:




















These plots are useful, but fairly unattractive. In our next example, we'll make them prettier.

Book now discounted 33% at Amazon!

Our book, SAS and R: Data Management, Statistical Analysis, and Graphics, is discounted by a full third at Amazon. With free shipping! Also, they claim if it is further discounted before it ships, they'll give you the reduced price.

Saturday, June 13, 2009

Example 7.2: Simulate data from a logistic regression

It might be useful to be able to simulate data from a logistic regression (section 4.1.1). Our process is to generate the linear predictor, then apply the inverse link, and finally draw from a distribution with this parameter. This approach is useful in that it can easily be applied to other generalized linear models. In this example we assume an intercept of 0 and a slope of 0.5, and generate 1,000 observations. See section 4.6.1 for an example of fitting logistic regression.


SAS
In SAS, we do this within a data step. We define parameters for the model and use looping (section 1.11.1) to replicate the model scenario for random draws of standard normal covariate values (section 1.10.5), calculating the linear predictor for each, and testing the resulting expit against a random draw from a standard uniform distribution (section 1.10.3).

data test;
intercept = 0;
beta = .5;
do i = 1 to 1000;
xtest = normal(12345);
linpred = intercept + (xtest * beta);
prob = exp(linpred)/ (1 + exp(linpred));
ytest = uniform(0) lt prob;
output;
end;
run;


R
In R we begin by assigning parameter values for the model. We then generate 1,000 random normal variates (section 1.10.5), calculating the linear predictor and expit for each, and then testing vectorwise (section 1.11.2) against 1,000 random uniforms (1.10.3).

intercept = 0
beta = 0.5
xtest = rnorm(1000,1,1)
linpred = intercept + xtest*beta
prob = exp(linpred)/(1 + exp(linpred))
runis = runif(1000,0,1)
ytest = ifelse(runis < prob,1,0)

Friday, June 12, 2009

Example 7.1: Create a Fibonacci sequence

The Fibonacci numbers have many mathematical relationships and have been discovered repeatedly in nature. They are constructed as the sum of the previous two values, initialized with the values 1 and 1.

A pdf of this example is available here.

SAS
In SAS, we use the lag function (section 1.4.17, p. 22) to retrieve the last value.

data fibo;
do i = 1 to 10;
fib = sum(fib, lag(fib));
if i eq 1 then fib = 1;
output;
end;
run;
proc print data=fibo;
run;

This generates the following output:

Obs i fib
1 1 1
2 2 1
3 3 2
4 4 3
5 5 5
6 6 8
7 7 13
8 8 21
9 9 34
10 10 55

R
In R we can loop over an array to perform the same job.

len = 10
fibvals = numeric(len)
fibvals[1] = 1
fibvals[2] = 1
for (i in 3:len) {
fibvals[i] = fibvals[i-1] + fibvals[i-2]
}

This generates the following output:

> fibvals
[1] 1 1 2 3 5 8 13 21 34 55