Tuesday, September 30, 2014

Example 2014.11: Contrasts the basic way for R

As we discuss in section 6.1.4 of the second edition, R and SAS handle categorical variables and their parameterization in models quite differently. SAS treats them on a procedure-by-procedure basis, which leads to some odd differences in capabilities and default parameterizations. For example, in the logistic procedure, the default is effect cell coding, while in the genmod procedure-- which also fits logistic regression-- the default is reference cell coding. Meanwhile, many procedures can only accommodate reference cell coding.

In R, in contrast, categorical variables can be designated as "factors" and parameterization stored an attribute of the factor.

In section 6.1.4, we demonstrate how the parameterization of a factor can be easily changed on the fly, in R, in lm(),glm(), and aov, using the contrasts= option in those functions. Here we show how to set the attribute more generally, for use in functions that don't accept the option. This post was inspired by a question from Julia Kuder, of Brigham and Women's Hospital.

SAS
We begin by simulating censored survival data as in Example 7.30. We'll also export the data to use in R.
data simcox;
beta1 = 2;
lambdat = 0.002; *baseline hazard;
lambdac = 0.004; *censoring hazard;
do i = 1 to 10000;
x1 = rantbl(0, .25, .25,.25);
linpred = exp(-beta1*(x1 eq 4));
t = rand("WEIBULL", 1, lambdaT * linpred);
* time of event;
c = rand("WEIBULL", 1, lambdaC);
* time of censoring;
time = min(t, c);    * which came first?;
censored = (c lt t);
output;
end;
run;

proc export data=simcox replace
outfile="c:/temp/simcox.csv"
dbms=csv;
run;

Now we'll fit the data in SAS, using effect coding.
proc phreg data=simcox;
class x1 (param=effect);
model time*censored(0)= x1 ;
run;

We reproduce the rather unexciting results here for comparison with R.
Parameter     Standard
Parameter     DF     Estimate        Error

x1        1    1     -0.02698      0.03471
x1        2    1     -0.01211      0.03437
x1        3    1     -0.05940      0.03458

R
In R we read the data in, then use the C() function to assign the contr.sum contrast to a version of the x1 variable that we save as a factor. Once that is done, we can fit the proportional hazards regression with the desired contrast.