Monday, May 7, 2012

Example 9.30: addressing multiple comparisons

We've been more sensitive to accounting for multiple comparisons recently, in part due to work that Nick and colleagues published on the topic.

In this entry, we consider results from a randomized trial (Kypri et al., 2009) to reduce problem drinking in Australian university students.
Seven outcomes were pre-specified: three designated as primary and four as secondary. No adjustment for multiple comparisons was undertaken. The p-values were given as 0.001, 0.001 for the primary outcomes and 0.02 and .001, .22, .59 and .87 for the secondary outcomes.
In this entry, we detail how to adjust for multiplicity using R and SAS.


The p.adjust() function in R calculates a variety of different approaches for multiplicity adjustments given a vector of p-values. These include the Bonferroni procedure (where the alpha is divided by the number of tests or equivalently the p-value is multiplied by that number, and truncated back to 1 if the result is not a probability). Other, less conservative corrections are also included (these are Holm (1979), Hochberg (1988), Hommel (1988), Benjamini and Hochberg (1995) and Benjamini and Yekutieli (2001)). The first four methods provide strong control for the family-wise error rate and all dominate the Bonferroni procedure. Here we compare the results from the unadjusted, Benjamini and Hochberg method="BH" and Bonferroni procedure for the Kypri et al. study.

pvals = c(.001, .001, .001, .02, .22, .59, .87)
BONF = p.adjust(pvals, "bonferroni")
BH = p.adjust(pvals, "BH")
res = cbind(pvals, BH=round(BH, 3), BONF=round(BONF, 3))

This yields the following results:

pvals BH BONF
[1,] 0.001 0.002 0.007
[2,] 0.001 0.002 0.007
[3,] 0.001 0.002 0.007
[4,] 0.020 0.035 0.140
[5,] 0.220 0.308 1.000
[6,] 0.590 0.688 1.000
[7,] 0.870 0.870 1.000

The only substantive difference between the three sets of unadjusted and adjusted p-values is seen for the 4th most significant outcome, which remains statistically significant at the alpha=0.05 level for all but the Bonferroni procedure.

It is straightforward to graphically display these results (as seen above):

matplot(res, ylab="p-values", xlab="sorted outcomes")
abline(h=0.05, lty=2)
legend(1, .9, legend=c("Bonferroni", "Benjamini-Hochberg", "Unadjusted"),
col=c(3, 2, 1), lty=c(3, 2, 1), cex=0.7)

It bears mentioning here that the Benjamini-Hochberg procedure really only make sense in the gestalt. That is, it would probably be incorrect to take the adjusted p-values from above and remove them from the context of the 7 tests performed here. The correct use (as with all tests) is to pre-specify the alpha level, and reject tests with p-values that are smaller. What p.adjust() reports is the smallest family-wise alpha error under which each of the tests would result in a rejection of the null hypothesis. But the nature of the Benjamini-Hochberg procedure is that this value may well depend on the other observed p-values. We will explore this further in a later entry.

The multtest procedure will perform a number of multiple testing procedures. It works with raw data for ANOVA models, and can also accept a list of p-values as shown here. (Note that "FDR" (false discovery rate) is the name that Benjamini and Hochberg give to their procedure and that this nomenclature is used by SAS.) Various other procedures can do some adjustment through, e.g., the estimate statement, but multtest is the most flexible. A plot similar to that created in R is shown below.

data a;
input Test$ Raw_P @@;
test01 0.001 test02 0.001 test03 0.001
test04 0.02 test05 0.22 test06 0.59
test07 0.87

proc multtest inpvalues=a bon fdr plots=adjusted(unpack);
Test Raw Bonferroni Rate

1 0.0010 0.0070 0.0023
2 0.0010 0.0070 0.0023
3 0.0010 0.0070 0.0023
4 0.0200 0.1400 0.0350
5 0.2200 1.0000 0.3080
6 0.5900 1.0000 0.6883
7 0.8700 1.0000 0.8700

An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers and PROC-X with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.


Rick Wicklin said...

You might like to know that the Second Edition of _Multiple Comparisions and Multiple Tests Using the SAS System_ has recently been published. This is an awesome book by Peter Westfall, Randy Tobias, and Russ Wolfinger that describes how to do all kinds of multiple comparisons in SAS. The book's Web page is

Anonymous said...

Why do we need to adjust for multiple comparisons which violates the likelihood principle?

Ken Kleinman said...

Well, most pragmatically, I would expect that making this response would not ameliorate a reviewer's concerns about my article.

But perhaps you have had a different experience?

Suleimen A. said...

Hi all,
I would like to know what is the default level of confidence when running the p.adjust() function with the method "BH".
Up to now, i was unable to find out.
The level of confidence is probably equal to 0.95 and i'm wondering if it possible to run this function at a level of 0.99.
I'm novice in statistics.
Many thanks for your answer.

Ken Kleinman said...

Hi Suleimen--

I believe the way these adjustments work is that the p-values themselves are adjusted, and then you can use whatever alpha level you like on the resulting values. So if you want the false discovery rate to be 0.01, you would just use that value on the vector resulting from using p.adjust().