Monday, June 30, 2014

Example 2014.8: Estimate power for an interaction, by simulation

In our last entry, we demonstrated how to simulate data from a logistic regression with an interaction between a dichotomous and a continuous covariate. In this entry we show how to use the simulation to estimate the power to detect that interaction. This is a simple, elegant, and powerful idea: simply simulate data under the alternative, and count the proportion of times the null is rejected. This is an estimate of power. If we lack infinite time to simulate data sets, we can also generate confidence intervals for the proportion.

R
In R, extending the previous example is almost trivially easy. The coef() function, applied to a glm summary object, returns an array with the parameter estimate, standard error, test statistic, and p-value. In one statement, we can extract the p-value for the interaction and return an indicator of a rejected null hypothesis. (This line is commented on below.) Then the routine is wrapped as a trivial function.
logist_inter = function() {
  c = rep(0:1,each=50)  # sample size is 100
  x = rnorm(100)
  lp = -3 + 2*c*x
  link_lp = exp(lp)/(1 + exp(lp))
  y = (runif(100) < link_lp) 

  log.int = glm(y~as.factor(c)*x, family=binomial)
  reject = ifelse( coef(summary(log.int))[4,4] < .05, 1, 0)
      # The coef() function above gets the parameter estimates; the [4,4] 
      # element is the p-value for the interaction.
  return(reject)
}
Running the function many times is also trivial, using the replicate() function.
pow1 = replicate(100, logist_inter())
The result is an array of 1s and 0s. To get the estimated power and confidence limits, we use the binom.test() function.
binom.test(sum(pow1), 100)
The test gives a p-value against the null hypothesis that the probability of rejection is 0.5, which is not interesting. The interesting part is at the end.
95 percent confidence interval:
 0.3219855 0.5228808
sample estimates:
probability of success 
                  0.42 
It would be simple to adjust this code to allow a change in the number of subjects or of the effect sizes, etc.

SAS
In SAS, generating the data is no trouble, but evaluating the power programmatically requires several relatively cumbersome steps. To generate multiple data sets, we include the data generation loop from the previous entry within another loop. (Note that the number of observations has also been reduced vs. the previous entry.)
data test;
do ds = 1 to 100;  #100 data sets
  do i = 1 to 100; #100 obs/data set
    c = (i gt 50);
    x = normal(0);
    lp = -3 + 2*c*x;
    link_lp = exp(lp)/(1 + exp(lp));
    y = (uniform(0) lt  link_lp); 
    output;
  end;
end;
run;

Then we fit all of the models at once, using the by statement. Here, the ODS system suppresses voluminous output and is also used to capture the needed results in a single data set. The name of the piece of output that holds the parameter estimates (parameterestimates) can be found with the ods trace on statement.
ods select none;
ods output parameterestimates= int_ests;
proc logistic data = test ;
  by ds;
  class c (param = ref desc);
  model y(event='1') = x|c;
run;
ods exclude none;

The univariate procedure can be used to count the number of times the null hypothesis of no interaction would be rejected. To do this, we use the loccount option to request a table of location counts, and the mu0 option to specify that the location of interest is 0.05. As above, since our goal is to use the count programmatically, we also extract the result into a data set. If you're following along at home, it's probably worth your while to print out some of this data to see what it looks like.
ods output locationcounts=int_power;
proc univariate data = int_ests loccount mu0=.05;
  where variable = "x*c";
  var probchisq;
run;
For example, while the locationcounts data set reports the number of observations above and below 0.05, it also reports the number not equal to 0.05. This is not so useful, and we need to exclude this row from the next step. We do that with a where statement. Then proc freq gives us the proportion and (95%) confidence limits we need, using the binomial option to get the confidence limits and the weight statement to convey the fact that the count variable represents the number of observations.
proc freq data = int_power;
  where count ne "Num Obs ^= Mu0";
  tables count / binomial;
  weight value;
run;
Finally, we find our results:
                        Binomial Proportion
                       Count = Num Obs < Mu0

                  Proportion                0.4000
                  ASE                       0.0490
                  95% Lower Conf Limit      0.3040
                  95% Upper Conf Limit      0.4960

                  Exact Conf Limits
                  95% Lower Conf Limit      0.3033
                  95% Upper Conf Limit      0.5028
We estimate our power at only 40%, with a confidence limit of (30%, 50%). This agrees closely enough with R: we don't need to narrow the limit to know that we'll need a larger sample size.

An unrelated note about aggregators: We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.

Tuesday, June 24, 2014

Example 2014.7: Simulate logistic regression with an interaction

Reader Annisa Mike asked in a comment on an early post about power calculation for logistic regression with an interaction.

This is a topic that has come up with increasing frequency in grant proposals and article submissions. We'll begin by showing how to simulate data with the interaction, and in our next post we'll show how to assess power to detect the interaction using simulation.

As in our earlier post, our method is to construct the linear predictor and the link function separately. This should help to clarify the roles of the parameter values and the simulated data.

SAS
In keeping with Annisa Mike's question, we'll simulate the interaction between a categorical and a continuous covariate. We'll make the categorical covariate dichotomous and the continuous one normal. To keep things simple, we'll leave the main effects null-- that is, the effect of the continuous covariate when the dichotomous one is 0 is also 0.
data test;
do i = 1 to 1000;
    c = (i gt 500);
    x = normal(0);
    lp = -3 + 2*c*x;
    link_lp = exp(lp)/(1 + exp(lp));
    y = (uniform(0) lt link_lp); 
 output;
end;
run;
In proc logistic, unlike other many other procedures, the default parameterization for categorical predictors is effect cell coding. This can lead to unexpected and confusing results. To get reference cell coding, use the syntax for the class statement shown below. This is similar to the default result for the glm procedure. If you need identical behavior to the glm procedure, use param = glm. The desc option re-orders the categories to use the smallest value as the reference category.
proc logistic data = test plots(only)=effect(clband);
class c (param = ref desc);
model y(event='1') = x|c;
run;
The plots(only)=effect(clband) construction in the proc logistic statement generates the plot shown above. If c=0, the probability y=1 is small for any value of x, and a slope of 0 for x is tenable. If c=1, the probability y=1 increases as x increases and nears 1 for large values of x.

The parameters estimated from the data show good fidelity to the selected values, though this is merely good fortune-- we'd expect the estimates to often be more different than this.
                              Standard         Wald
Parameter     DF   Estimate      Error   Chi-Square   Pr > ChiSq

Intercept      1    -3.0258     0.2168     194.7353       <.0001
x              1    -0.2618     0.2106       1.5459       0.2137
c         1    1    -0.0134     0.3387       0.0016       0.9685
x*c       1    1     2.0328     0.3168      41.1850       <.0001


R
As sometimes occurs, the R code resembles the SAS code. Creating the data, in fact, is quite similar. The main differences are in the names of the functions that generate the randomness and the vectorized syntax that avoids the looping of the SAS datastep.
c = rep(0:1,each=500)
x = rnorm(1000)
lp = -3 + 2*c*x
link_lp = exp(lp)/(1 + exp(lp))
y = (runif(1000) < link_lp) 


We fit the logistic regression with the glm() function, and examine the parameter estimates.
log.int = glm(y~as.factor(c)*x, family=binomial)
summary(log.int)


Here, the estimate for the interaction term is further from the selected value than we lucked into with the SAS simulation, but the truth is well within any reasonable confidence limit.
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)      -3.3057     0.2443 -13.530  < 2e-16 ***
as.factor(c)1     0.4102     0.3502   1.171    0.241    
x                 0.2259     0.2560   0.883    0.377    
as.factor(c)1:x   1.7339     0.3507   4.944 7.66e-07 ***


A simple plot of the predicted values can be made fairly easily. The predicted probabilities of y=1 reside in the summary object as log.int$fitted.values. We can color them according to the values of the categorical predictor by defining a color vector and then choosing a value from the vector for each observation. The resulting plot is shown below. If we wanted confidence bands as in the SAS example, we could get standard error for the (logit scale) predicted values using the predict() function with the se.fit option.
mycols = c("red","blue")
plot(log.int$fitted.values ~ x, col=mycols[c+1])


An unrelated note about aggregators: We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.

Thursday, June 12, 2014

Example 2014.6: Comparing medians and the Wilcoxon rank-sum test

A colleague recently contacted us with the following question: "My outcome is skewed-- how can I compare medians across multiple categories?"

What they were asking for was a generalization of the Wilcoxon rank-sum test (also known as the Mann-Whitney-Wilcoxon test, among other monikers) to more than two groups. For the record, the answer is that the Kruskal-Wallis test is the generalization of the Wilcoxon, the one-way ANOVA to the Wilcoxon's t-test.

But this question is based on a false premise: that the the Wilcoxon rank-sum test is used to compare medians. The premise is based on a misunderstanding of the null hypothesis of the test. The actual null hypothesis is that there is a 50% probability that a random value from one population exceeds an random value from the other population. The practical value of this is hard to see, and thus in many places, including textbooks, the null hypothesis is presented as "the two populations have equal medians". The actual null hypothesis can be expressed as the latter median hypothesis, but only under the additional assumption that the shapes of the distributions are identical in each group.

In other words, our interpretation of the test as comparing the medians of the distributions requires the location-shift-only alternative to be the case. Since this is rarely true, and never assessed, we should probably use extreme caution in using, and especially in interpreting, the Wilcoxon rank-sum test.

To demonstrate this issue, we present a simple simulation, showing a case of two samples with equal medians but very different shapes. In one group, the values are exponential with mean = 1 and therefore median log 2, in the other they are normal with mean and median = log 2 and variance = 1. We generate 10,000 observations and show that the Wilcoxon rank-sum test rejects the null hypothesis. If we interpret the null incorrectly as applying to the medians, we will be misled. If our interest actually centered on the medians for some reasons, an appropriate test that would not be sensitive to the shape of the distribution could be found in a quantile regression. Another, of course, would be the median test. We show that this test does not reject the null hypothesis of equal medians, even with the large sample size.

We leave aside the deeper questions of whether comparing medians is a useful substitute for comparing means, or whether means should not be compared when distributions are skewed.

R
In R, we simulate two separate vectors of data, then feed them directly to the wilcox.test() function (section 2.4.2).
y1 = rexp(10000)
y2 = rnorm(10000) + log(2)

wilcox.test(y1,y2)
This shows a very small p-value, denoting the fact not that the medians are unequal but that one or the other of these distributions generally has larger values. Hint: it might be the one with the long tail and no negative values.
 Wilcoxon rank sum test with continuity correction

data:  y1 and y2
W = 55318328, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
To set up the quantile regression, we put the observations into a single vector and make a group indicator vector. Then we load the quantreg package and use the rq() function (section 4.4.4) to fit the median regression.
y = c(y1,y2)
c = rep(1:2, each = 10000)

library(quantreg)
summary(rq(y~as.factor(c)))
This shows we would fail to reject the null of equal medians.
Call: rq(formula = y ~ as.factor(c))

tau: [1] 0.5

Coefficients:
              Value    Std. Error t value  Pr(>|t|)
(Intercept)    0.68840  0.01067   64.53047  0.00000
as.factor(c)2 -0.00167  0.01692   -0.09844  0.92159
Warning message:
In rq.fit.br(c, y, tau = tau, ...) : Solution may be nonunique


SAS
In SAS, we make a data set with a group indicator and use it to generate data conditionally.
data wtest;
do i = 1 to 20000;
  c = (i gt 10000);
  if c eq 0 then y = ranexp(0);
    else y = normal(0) + log(2);
  output;
  end;
run;
The Wilcoxon rank-sum test is in proc npar1way (section 2.4.2).
proc npar1way data = wtest wilcoxon;
class c;
var y;
run;
As with R, we find a very small p-value.
                       The NPAR1WAY Procedure

             Wilcoxon Scores (Rank Sums) for Variable y
                      Classified by Variable c

                     Sum of     Expected      Std Dev         Mean
  c          N       Scores     Under H0     Under H0        Score

  0      10000    106061416    100005000   408258.497   10606.1416
  1      10000     93948584    100005000   408258.497    9394.8584


                      Wilcoxon Two-Sample Test

                Statistic             106061416.0000

                Normal Approximation
                Z                            14.8348
                One-Sided Pr >  Z             <.0001
                Two-Sided Pr > |Z|            <.0001

                t Approximation
                One-Sided Pr >  Z             <.0001
                Two-Sided Pr > |Z|            <.0001

             Z includes a continuity correction of 0.5.


                        Kruskal-Wallis Test

                Chi-Square                  220.0700
                DF                                 1
                Pr > Chi-Square               <.0001

The median regression is in proc quantreg (section 4.4.4). As in R, we fail to reject the null of equal medians.
proc quantreg data =wtest;
class c;
model y = c;
run;
                       The QUANTREG Procedure

                  Quantile and Objective Function

              Predicted Value at Mean          0.6909


                        Parameter Estimates

                            Standard    95% Confidence
    Parameter   DF Estimate    Error        Limits       t Value

    Intercept    1   0.6909   0.0125    0.6663    0.7155   55.06
    c         0  1   0.0165   0.0160   -0.0148    0.0479    1.03
    c         1  0   0.0000   0.0000    0.0000    0.0000     .

                        Parameter Estimates

                        Parameter   Pr > |t|

                        Intercept     <.0001
                        c         0   0.3016
                        c         1    .



An unrelated note about aggregators: We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.