SAS and R: Example 9.12: simpler ways to carry out permutation tests

Monday, October 31, 2011

Example 9.12: simpler ways to carry out permutation tests

In a previous entry, as well as section 2.4.3 of the book, we describe how to carry out a 2 group permutation test in SAS as well as with the coin package in R. We demonstrate with comparing the ages of the female and male subjects in the HELP study.

In this entry, we revisit the permutation test using other functions.

R

We describe a simpler interface to carry out and visualize permutation tests using the functions from the mosaic package.

We begin by replicating our previous example (section 2.6.4, p. 87).


ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
library(coin)
numsim = 1000
oneway_test(age ~ as.factor(female), 
  distribution=approximate(B=numsim-1), data=ds)

which yields the following output:


 Approximative 2-Sample Permutation Test

data:  age by as.factor(female) (0, 1) 
Z = -0.9194, p-value = 0.3894
alternative hypothesis: true mu is not equal to 0

We conclude that there is minimal evidence to contradict the null hypothesis that the two groups have the same ages back in their respective populations.

Now we demonstrate another way to run this test in a more general form, using the mosaic package's do() function combined with the * operator to repeatedly carry out fitting a linear model with a parameter for female which will calculate our test statistic (difference in means between females and males) repeatedly after shuffling the group indicators. The shuffle() function permutes the group labels, and then the summary statistic is calculated.


> library(mosaic)
> obsdiff = with(ds, mean(age[female==1]) - mean(age[female==0]))
> obsdiff
     mean 
0.7841284 
> summary(age ~ female, data=ds, fun=mean)
age    N=453

+-------+---+---+--------+
|       |   |N  |mean    |
+-------+---+---+--------+
|female |No |346|35.46821|
|       |Yes|107|36.25234|
+-------+---+---+--------+
|Overall|   |453|35.65342|
+-------+---+---+--------+

Now we can run the permutation test, then display the results on a souped-up histogram with different shading for values larger in magnitude than the observed statistic (see above).


res = do(numsim) * lm(age ~ shuffle(female), data=ds)
pvalue = sum(abs(res$female) > abs(obsdiff)) / numsim
xhistogram(~ female, groups = abs(female) > abs(obsdiff), 
  n=50, density=TRUE, data=res, xlab="difference between groups",
  main=paste("Permutation test result: p=", round(pvalue, 3)))

The results are similar to those from the previous test: there is little evidence to contradict the null hypothesis.

SAS

In SAS, we'll take another approach, delving into the capabilities of proc iml to make a manual permutation test. We begin by reading the data and replicating the example in the book.


libname k 'c:\book';
proc npar1way data = k.help;
class female;
var age;
exact scores=data / mc n= 9999 alpha = .05;
run;

                   Data Scores One-Way Analysis

                  Chi-Square               0.8453
                  DF                            1
                  Pr > Chi-Square          0.3579

Permuting data is a very awkward thing to do in data steps. But it turns out to be easy in proc iml (the built-in SAS matrix language). Here we read in the key variables from the data set (use and read). Then we generate the permutations (ranperm). However, this generates row for each permuted data set, while we need a column for each, so we transpose the matrix (t) before saving it. Then we save the resulting data in a SAS data set with the female variable. Note that we permuted the ages only, as opposed to the R example-- it doesn't matter which is permuted, of course. Much of the proc iml code used here can be found in section 1.9 of the book-- however, note that curly braces are required in the read statement, as shown below.


proc iml;
use k.help;
read all var{female age} into x;
p = t(ranperm(x[,2],1000));
paf = x[,1]||p;
create newds from paf;
append from paf;
quit;

With the permuted data in hand, we use proc ttest (section 2.4.1) with the ODS system to generate and save the differences. Note that the default variable names from proc iml are fairly nondescript. With the 1000 permuted statistics in hand, we can generate a histogram of the statistics and a p-value with proc univariate.


ods output conflimits=diff;
proc ttest data=newds plots=none;
  class col1;
  var col2 - col1001;
run;

proc univariate data=diff;
  where method = "Pooled";
  var mean;
  histogram mean / normal;
run;

data diff2;
set diff;
absdiff = abs(mean);
run;

proc univariate data=diff2
  loccount mu0 = 0.7841284;
  where method = "Pooled";
  var absdiff;
run;

                     Location Counts: Mu0=0.78

                     Count                Value

                     Num Obs > Mu0          357
                     Num Obs ^= Mu0        1000
                     Num Obs < Mu0          643

1 comment:

Anonymous said...: The RANPERM function was added in SAS 9.3, but you can do the same analysis in earlier versions by generating random uniform values and using the RANK function to generate the permutation. For a matched pair permutation test, see p. 11-14 of http://support.sas.com/resources/papers/proceedings10/329-2010.pdf; October 31, 2011 at 11:29 AM

Post a Comment

Reviews (from the first edition)

"By placing the R and SAS solutions together and by covering a vast array of tasks in one book, Kleinman and Horton have added surprising value and searchability to the information in their book. … a home run, and it is a book I am grateful to have sitting, dust-free, on my shelf."
—Robert Alan Greevy, Jr, Teaching of Statistics in the Health Sciences

"I use SAS and R on a daily basis. Each has strengths and weaknesses, and using both of them gives the advantage of being able to do almost anything when it comes to data manipulation, analysis, and graphics. If you use both SAS and R on a regular basis, get this book. If you know one of the packages and are learning the other, you may need more than this book, but get this book, too. "

Charles Heckler, University of Rochester, Technometrics

"Excellent cross-referencing to other topics and end-of-chapter worked examples on the ‘Health evaluation and linkage to primary care’ data set are given with each topic. … users who are proficient in either of the software packages but with the need to use the other will find this book useful."
—Frances Denny, Journal of the Royal Statistical Society, Series A

About the authors

Nicholas Horton is a Professor of Statistics at Amherst College. He is a biostatistician with expertise in missing data methods, longitudinal regression, statistical computing and statistical education. Nick's home page; Nick's Google Scholar author page

Ken Kleinman is an Associate Professor with the Department of Biostatistics and Epidemiology at the University of Massachusetts, Amherst. He is a consulting biostatistician with expertise in group-randomized trials and disease surveillance; he also offers R training courses. Ken's home page; Ken's Google Scholar author page.

SAS and R

Catalogs of posts

Monday, October 31, 2011

Example 9.12: simpler ways to carry out permutation tests

1 comment:

About SAS and R

Topics discussed