Showing posts with label false discovery rate. Show all posts
Showing posts with label false discovery rate. Show all posts

Monday, May 14, 2012

Example 9.31: Exploring multiple testing procedures

In example 9.30 we explored the effects of adjusting for multiple testing using the Bonferroni and Benjamini-Hochberg (or false discovery rate, FDR) procedures. At the time we claimed that it would probably be inappropriate to extract the adjusted p-values from the FDR method from their context. In this entry we attempt to explain our misgivings about this practice.

The FDR procedure is described in Benjamini and Hochberg (JRSSB, 1995) as a "step-down" procedure. Put simply, the procedure has the following steps:

0. Choose the familywise alpha
1. Rank order the unadjusted p-values
2. Beginning with the Mth of the ordered p-values p(m),
2a. if p(m) < alpha*(m/M), then reject all tests 1 ... m,
2b. if not, m = m-1
3. Repeat steps 2a and 2b until the condition is met
or p(1) > alpha/M

where M is the number of tests. The "adjusted p-value" based on this procedure is the smallest familywise alpha under which the current test would have been rejected. To calculate this, we can modify the routine above:

1. Rank order the unadjusted p-values
2. For ordered p-values p(m) M to 1,
2a. candidate ap(m) = p(m) *(M/m)
2b. if candidate ap(m) > ap(m+1) then ap(m) = ap(m+1)
2c. else ap(m) = candidate ap(m)

where ap(m) refers to the adjusted p-value corresponding to the mth ordered unadjusted p-value. It's interesting to note that the adjusted p-value for the Mth ordered test is the same as the unadjusted p-value, while the candidate adjusted p-value for the smallest test is the Bonferroni adjusted p-value. The primary difficulty with taking these p-values (as opposed to the test results) out of context is captured in steps 2b and 2c. They imply that the p-value for a given test may be lowered by other observed p-values in the family of tests. It's also true that the adjusted p-value depends on the number of tests included in the family, but this seems somewhat less troubling.

To examine the impact of the procedure on the adjusted p-values for the individual tests, we'll compare the candidate ap(m) from step 2a against the actual ap(m). Our sense is that to the degree these are different, the adjusted p-value should not be extracted from the context of the observed family of tests.


SAS
Our SAS code relies heavily on the array statement (section 1.11.5). We loop through the p-values from largest to smallest, calculating the candidate fdr p-value as above, before arriving at the final adjusted p-value. To compare the values conveniently, we make a new data set with two copies of the original data set, renaming first the candidate and then the adjusted p-values to have the same names. The in = data set option creates a temporary variable which identifies which data set an observation was read from; here it denotes which version of the same data set (and which set of p-values) was used.

data fdr;
array pvals [10] pval1 - pval10
(.001 .001 .001 .001 .001 .03 .035 .04 .05 .05);
array cfdrpvals [10] cfdr1 - cfdr10;
array fdrpvals [10] fdr1 - fdr10;
fdrpvals[10] = pvals[10];
do i = 9 to 1 by -1;
cfdrpvals[i] = pvals[i] * 10/i;
if cfdrpvals[i] > fdrpvals[i+1] then fdrpvals[i] = fdrpvals[i+1];
else fdrpvals[i] = cfdrpvals[i];
end;
run;

data compare;
set fdr (in = cfdr rename=(cfdr1=c1 cfdr2=c2 cfdr3=c3 cfdr4=c4
cfdr5=c5 cfdr6=c6 cfdr7=c7 cfdr8=c8 cfdr9=c9))
fdr (in = fdr rename=(fdr1=c1 fdr2=c2 fdr3=c3 fdr4=c4 fdr5=c5
fdr6=c6 fdr7=c7 fdr8=c8 fdr9=c9));
if cfdr then adjustment = "Candidate fdr";
if fdr then adjustment = "Final fdr";
run;

proc print data = compare; var adjustment c1-c9; run;

adjustment c1 c2 c3 c4 c5 c6 c7 c8 c9

Candidate fdr 0.010 .005 .0033 .0025 .002 .05 .05 .05 .055
Final fdr 0.002 .002 .0020 .0020 .002 .05 .05 .05 .050

(We omit the last p-value because the adjustment does not affect it.) The result shows that for many of the tests in this family, a substantially smaller p-value is obtained with the final FDR p-value than the candidate. To this degree, the FDR p-value is dependent on the observed values of the p-values in the tests in the family, and ought not to be removed from the context of these other tests. We would recommend caution in displaying the FDR p-values in such settings, given readers' propensity to use them as if they were ordinary p-values, safely adjusted for multiple testing.

R
Comparison of the R and SAS code may make SAS programmers weep. The candidate values are easily calculated, and can be presented with the final p-values in one step using the p.adjust() function. Three lines of code, albeit incorporating multiple functions in each line. (And it could sensibly be done in two, calculating the candidate p-values within the rbind() function call.) Note especially the line calculating the candidate p-values, in which vectorization allows a for loop to be avoided in a very natural fashion.

fakeps = c(rep(.2, 5), 6, 7, 8, 10, 10)/200
cfdr = fakeps * 10/(1:10)
rbind(cfdr, fdr=p.adjust(fakeps, "fdr"))[,1:9]

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
cfdr 0.010 0.005 0.0033 0.0025 0.002 0.05 0.05 0.05 0.0556 0.05
fdr 0.002 0.002 0.0020 0.0020 0.002 0.05 0.05 0.05 0.0500 0.05


An unrelated note about aggregatorsWe love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers and PROC-X with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.

Monday, May 7, 2012

Example 9.30: addressing multiple comparisons


We've been more sensitive to accounting for multiple comparisons recently, in part due to work that Nick and colleagues published on the topic.

In this entry, we consider results from a randomized trial (Kypri et al., 2009) to reduce problem drinking in Australian university students.
Seven outcomes were pre-specified: three designated as primary and four as secondary. No adjustment for multiple comparisons was undertaken. The p-values were given as 0.001, 0.001 for the primary outcomes and 0.02 and .001, .22, .59 and .87 for the secondary outcomes.
In this entry, we detail how to adjust for multiplicity using R and SAS.

R

The p.adjust() function in R calculates a variety of different approaches for multiplicity adjustments given a vector of p-values. These include the Bonferroni procedure (where the alpha is divided by the number of tests or equivalently the p-value is multiplied by that number, and truncated back to 1 if the result is not a probability). Other, less conservative corrections are also included (these are Holm (1979), Hochberg (1988), Hommel (1988), Benjamini and Hochberg (1995) and Benjamini and Yekutieli (2001)). The first four methods provide strong control for the family-wise error rate and all dominate the Bonferroni procedure. Here we compare the results from the unadjusted, Benjamini and Hochberg method="BH" and Bonferroni procedure for the Kypri et al. study.

pvals = c(.001, .001, .001, .02, .22, .59, .87)
BONF = p.adjust(pvals, "bonferroni")
BH = p.adjust(pvals, "BH")
res = cbind(pvals, BH=round(BH, 3), BONF=round(BONF, 3))

This yields the following results:

pvals BH BONF
[1,] 0.001 0.002 0.007
[2,] 0.001 0.002 0.007
[3,] 0.001 0.002 0.007
[4,] 0.020 0.035 0.140
[5,] 0.220 0.308 1.000
[6,] 0.590 0.688 1.000
[7,] 0.870 0.870 1.000

The only substantive difference between the three sets of unadjusted and adjusted p-values is seen for the 4th most significant outcome, which remains statistically significant at the alpha=0.05 level for all but the Bonferroni procedure.

It is straightforward to graphically display these results (as seen above):

matplot(res, ylab="p-values", xlab="sorted outcomes")
abline(h=0.05, lty=2)
matlines(res)
legend(1, .9, legend=c("Bonferroni", "Benjamini-Hochberg", "Unadjusted"),
col=c(3, 2, 1), lty=c(3, 2, 1), cex=0.7)

It bears mentioning here that the Benjamini-Hochberg procedure really only make sense in the gestalt. That is, it would probably be incorrect to take the adjusted p-values from above and remove them from the context of the 7 tests performed here. The correct use (as with all tests) is to pre-specify the alpha level, and reject tests with p-values that are smaller. What p.adjust() reports is the smallest family-wise alpha error under which each of the tests would result in a rejection of the null hypothesis. But the nature of the Benjamini-Hochberg procedure is that this value may well depend on the other observed p-values. We will explore this further in a later entry.

SAS
The multtest procedure will perform a number of multiple testing procedures. It works with raw data for ANOVA models, and can also accept a list of p-values as shown here. (Note that "FDR" (false discovery rate) is the name that Benjamini and Hochberg give to their procedure and that this nomenclature is used by SAS.) Various other procedures can do some adjustment through, e.g., the estimate statement, but multtest is the most flexible. A plot similar to that created in R is shown below.

data a;
input Test$ Raw_P @@;
datalines;
test01 0.001 test02 0.001 test03 0.001
test04 0.02 test05 0.22 test06 0.59
test07 0.87
;

proc multtest inpvalues=a bon fdr plots=adjusted(unpack);
run;
False
Discovery
Test Raw Bonferroni Rate

1 0.0010 0.0070 0.0023
2 0.0010 0.0070 0.0023
3 0.0010 0.0070 0.0023
4 0.0200 0.1400 0.0350
5 0.2200 1.0000 0.3080
6 0.5900 1.0000 0.6883
7 0.8700 1.0000 0.8700




An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers and PROC-X with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.