Tuesday, September 30, 2014

Example 2014.11: Contrasts the basic way for R

As we discuss in section 6.1.4 of the second edition, R and SAS handle categorical variables and their parameterization in models quite differently. SAS treats them on a procedure-by-procedure basis, which leads to some odd differences in capabilities and default parameterizations. For example, in the logistic procedure, the default is effect cell coding, while in the genmod procedure-- which also fits logistic regression-- the default is reference cell coding. Meanwhile, many procedures can only accommodate reference cell coding.

In R, in contrast, categorical variables can be designated as "factors" and parameterization stored an attribute of the factor.

In section 6.1.4, we demonstrate how the parameterization of a factor can be easily changed on the fly, in R, in lm(),glm(), and aov, using the contrasts= option in those functions. Here we show how to set the attribute more generally, for use in functions that don't accept the option. This post was inspired by a question from Julia Kuder, of Brigham and Women's Hospital.

SAS
We begin by simulating censored survival data as in Example 7.30. We'll also export the data to use in R.
data simcox;
  beta1 = 2;
  lambdat = 0.002; *baseline hazard;
  lambdac = 0.004; *censoring hazard;
  do i = 1 to 10000;
    x1 = rantbl(0, .25, .25,.25);
    linpred = exp(-beta1*(x1 eq 4));
    t = rand("WEIBULL", 1, lambdaT * linpred);
    * time of event;
    c = rand("WEIBULL", 1, lambdaC);
           * time of censoring;
    time = min(t, c);    * which came first?;
    censored = (c lt t);
    output;
  end;
run;

proc export data=simcox replace
outfile="c:/temp/simcox.csv"
dbms=csv;
run;

Now we'll fit the data in SAS, using effect coding.
proc phreg data=simcox;
class x1 (param=effect);
model time*censored(0)= x1 ;
run;

We reproduce the rather unexciting results here for comparison with R.
                     Parameter     Standard     
 Parameter     DF     Estimate        Error 

 x1        1    1     -0.02698      0.03471
 x1        2    1     -0.01211      0.03437
 x1        3    1     -0.05940      0.03458


R
In R we read the data in, then use the C() function to assign the contr.sum contrast to a version of the x1 variable that we save as a factor. Once that is done, we can fit the proportional hazards regression with the desired contrast.
simcox<- read.csv("c:/temp/simcox.csv")
sc2 = transform(simcox, x1.eff = C(as.factor(x1), contr.sum(4)))
effmodel <- coxph(Surv(time, censored)~ x1.eff,data= sc2)
summary(effmodel)
We excerpt the relevant output to demonstrate equivalence with SAS.
            coef exp(coef) se(coef)  
x1.eff1 -0.02698   0.97339  0.03471  
x1.eff2 -0.01211   0.98797  0.03437  
x1.eff3 -0.05940   0.94233  0.03458
An unrelated note about aggregators: We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.