SAS and R: Example 9.30: addressing multiple comparisons

Monday, May 7, 2012

Example 9.30: addressing multiple comparisons

We've been more sensitive to accounting for multiple comparisons recently, in part due to work that Nick and colleagues published on the topic.

In this entry, we consider results from a randomized trial (Kypri et al., 2009) to reduce problem drinking in Australian university students.
Seven outcomes were pre-specified: three designated as primary and four as secondary. No adjustment for multiple comparisons was undertaken. The p-values were given as 0.001, 0.001 for the primary outcomes and 0.02 and .001, .22, .59 and .87 for the secondary outcomes.
In this entry, we detail how to adjust for multiplicity using R and SAS.

R

The p.adjust() function in R calculates a variety of different approaches for multiplicity adjustments given a vector of p-values. These include the Bonferroni procedure (where the alpha is divided by the number of tests or equivalently the p-value is multiplied by that number, and truncated back to 1 if the result is not a probability). Other, less conservative corrections are also included (these are Holm (1979), Hochberg (1988), Hommel (1988), Benjamini and Hochberg (1995) and Benjamini and Yekutieli (2001)). The first four methods provide strong control for the family-wise error rate and all dominate the Bonferroni procedure. Here we compare the results from the unadjusted, Benjamini and Hochberg method="BH" and Bonferroni procedure for the Kypri et al. study.


pvals = c(.001, .001, .001, .02, .22, .59, .87)
BONF = p.adjust(pvals, "bonferroni")
BH = p.adjust(pvals, "BH")
res = cbind(pvals, BH=round(BH, 3), BONF=round(BONF, 3))

This yields the following results:


    pvals    BH  BONF
[1,] 0.001 0.002 0.007
[2,] 0.001 0.002 0.007
[3,] 0.001 0.002 0.007
[4,] 0.020 0.035 0.140
[5,] 0.220 0.308 1.000
[6,] 0.590 0.688 1.000
[7,] 0.870 0.870 1.000

The only substantive difference between the three sets of unadjusted and adjusted p-values is seen for the 4th most significant outcome, which remains statistically significant at the alpha=0.05 level for all but the Bonferroni procedure.

It is straightforward to graphically display these results (as seen above):


matplot(res, ylab="p-values", xlab="sorted outcomes")
abline(h=0.05, lty=2)
matlines(res)
legend(1, .9, legend=c("Bonferroni", "Benjamini-Hochberg", "Unadjusted"), 
  col=c(3, 2, 1), lty=c(3, 2, 1), cex=0.7)

It bears mentioning here that the Benjamini-Hochberg procedure really only make sense in the gestalt. That is, it would probably be incorrect to take the adjusted p-values from above and remove them from the context of the 7 tests performed here. The correct use (as with all tests) is to pre-specify the alpha level, and reject tests with p-values that are smaller. What p.adjust() reports is the smallest family-wise alpha error under which each of the tests would result in a rejection of the null hypothesis. But the nature of the Benjamini-Hochberg procedure is that this value may well depend on the other observed p-values. We will explore this further in a later entry.

SAS
The multtest procedure will perform a number of multiple testing procedures. It works with raw data for ANOVA models, and can also accept a list of p-values as shown here. (Note that "FDR" (false discovery rate) is the name that Benjamini and Hochberg give to their procedure and that this nomenclature is used by SAS.) Various other procedures can do some adjustment through, e.g., the estimate statement, but multtest is the most flexible. A plot similar to that created in R is shown below.


data a;
   input Test$ Raw_P @@;
   datalines;
test01  0.001    test02  0.001    test03  0.001
test04  0.02    test05  0.22    test06  0.59
test07  0.87
;

proc multtest inpvalues=a bon fdr plots=adjusted(unpack);
run;
                                                     False
                                                 Discovery
            Test           Raw    Bonferroni          Rate

               1        0.0010        0.0070        0.0023
               2        0.0010        0.0070        0.0023
               3        0.0010        0.0070        0.0023
               4        0.0200        0.1400        0.0350
               5        0.2200        1.0000        0.3080
               6        0.5900        1.0000        0.6883
               7        0.8700        1.0000        0.8700

An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers and PROC-X with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.

5 comments:

Rick Wicklin said...: You might like to know that the Second Edition of _Multiple Comparisions and Multiple Tests Using the SAS System_ has recently been published. This is an awesome book by Peter Westfall, Randy Tobias, and Russ Wolfinger that describes how to do all kinds of multiple comparisons in SAS. The book's Web page is https://support.sas.com/pubscat/bookdetails.jsp?pc=63594; May 7, 2012 at 3:20 PM
Anonymous said...: Why do we need to adjust for multiple comparisons which violates the likelihood principle?; May 8, 2012 at 8:32 PM
Ken Kleinman said...: Well, most pragmatically, I would expect that making this response would not ameliorate a reviewer's concerns about my article.

But perhaps you have had a different experience?; May 9, 2012 at 5:42 PM
Suleimen A. said...: Hi all,
I would like to know what is the default level of confidence when running the p.adjust() function with the method "BH".
Up to now, i was unable to find out.
The level of confidence is probably equal to 0.95 and i'm wondering if it possible to run this function at a level of 0.99.
I'm novice in statistics.
Many thanks for your answer.
Suleimen; September 24, 2014 at 3:24 AM
Ken Kleinman said...: Hi Suleimen--

I believe the way these adjustments work is that the p-values themselves are adjusted, and then you can use whatever alpha level you like on the resulting values. So if you want the false discovery rate to be 0.01, you would just use that value on the vector resulting from using p.adjust().; September 24, 2014 at 9:02 AM

Post a Comment

Reviews (from the first edition)

"By placing the R and SAS solutions together and by covering a vast array of tasks in one book, Kleinman and Horton have added surprising value and searchability to the information in their book. … a home run, and it is a book I am grateful to have sitting, dust-free, on my shelf."
—Robert Alan Greevy, Jr, Teaching of Statistics in the Health Sciences

"I use SAS and R on a daily basis. Each has strengths and weaknesses, and using both of them gives the advantage of being able to do almost anything when it comes to data manipulation, analysis, and graphics. If you use both SAS and R on a regular basis, get this book. If you know one of the packages and are learning the other, you may need more than this book, but get this book, too. "

Charles Heckler, University of Rochester, Technometrics

"Excellent cross-referencing to other topics and end-of-chapter worked examples on the ‘Health evaluation and linkage to primary care’ data set are given with each topic. … users who are proficient in either of the software packages but with the need to use the other will find this book useful."
—Frances Denny, Journal of the Royal Statistical Society, Series A

About the authors

Nicholas Horton is a Professor of Statistics at Amherst College. He is a biostatistician with expertise in missing data methods, longitudinal regression, statistical computing and statistical education. Nick's home page; Nick's Google Scholar author page

Ken Kleinman is an Associate Professor with the Department of Biostatistics and Epidemiology at the University of Massachusetts, Amherst. He is a consulting biostatistician with expertise in group-randomized trials and disease surveillance; he also offers R training courses. Ken's home page; Ken's Google Scholar author page.

SAS and R

Catalogs of posts

Monday, May 7, 2012

Example 9.30: addressing multiple comparisons

5 comments:

About SAS and R

Topics discussed