SAS and R: survival package

Showing posts with label survival package. Show all posts

Tuesday, March 13, 2012

Example 9.23: Demonstrating proportional hazards

A colleague recently asked after a slide suitable for explaining proportional hazards. In particular, she was concerned that her audience not focus on the time to event or probability of the event. An initial thought was to display the cumulative hazards, which have a constant proportion if the model is true. But the colleague's audience might get distracted both by the language (what's a "hazard"?) and the fact that the cumulative hazard doesn't have a readily interpretable scale. The failure curve, meaning the probability of failure by time t, over time, might be a bit more accessible.

Rather than just draw some curves, we simulated data, based on the code we demonstrated previously. In this case, there's no need for any interesting censoring, but a more interesting survival curve seems worthwhile.

SAS
The more interesting curve is introduced by manually accelerating and slowing down the Weibull survival time demonstrated in the previous approach. We also trichotomized one of the exposures to match the colleague's study, and censored all values greater than 10 to keep focus where the underlying hazard was interesting.


data simcox;
  beta1 = .2;
  beta2 = log(1.25);
  lambdat = 20; *baseline hazard;
  do i = 1 to 10000;
    x1 = normal(45);
    x2 = (normal(0) gt -1) + (normal(0) gt 1);
    linpred = -beta1*x1 - beta2*x2;
    t = rand("WEIBULL", 1, lambdaT * exp(linpred));
 if t gt 5 then t = 5 + (t-5)/10;
 if t gt 7 then t = 7 + (t-7) * 20;
    * time of event;
 censored = (t > 10);
    output;
  end;
run;

The phreg procedure will fit the model and produces nice plots of the survival function and the cumulative hazard. But to generate useful versions of these, you need to make an additional data set with the covariate values you want to show plots for. We set the other covariate's value at 0. You can also include an id variable with some descriptive text.


data covars;
x1 = 0; x2 = 2; id = "Arm 3"; output;
x1 = 0; x2 = 1; id = "Arm 2"; output;
x1 = 0; x2 = 0; id = "Arm 1"; output;
run;

proc phreg data = simcox plot(overlay)=cumhaz;
class x2 (ref = "0");
baseline covariates = covars out= kkout cumhaz = cumhaz 
    survival = survival / rowid = id;
model t*censored(1) = x1 x2;
run;

The cumulative hazard plot generated by the plot option is shown below, demonstrating the correct relative hazard of 1.25. The related survival curve could be generated with plot = s .

To get the desired plot of the failure times, use the out = option to the baseline statement. This generates a data set with the listed statistics (here the cumulative hazard and the survival probability, across time). Then we can produce a plot using the gplot procedure, after generating the failure probability.


data kk2;
set kkout;
iprob = 1 - survival;
run;

goptions reset=all;
legend1 label = none value=(h=2);
axis1 order = (0 to 1 by .25) minor = none value=(h=2)
  label = (a = 90 h = 3 "Probability of infection");
axis2 order = (0 to 10 by 2) minor = none value=(h=2)
  label = (h=3 "Attributable time");;
symbol1 i = sm51s v = none w = 3 r = 3;
proc gplot data = kk2;
plot iprob * t = id / vaxis = axis1 haxis = axis2 legend=legend1;
run; quit;

The result is shown at the top. Note the use of the h= option in various places in the axis and legend statement to make the font more visible when shrunk to fit onto a slide. The smoothing spline plotted through the data with the smXXs interpolation makes a nice shape out of the underlying abrupt changes in the hazard. The symbol, legend, and axis statements are discussed in chapter 6.

R
As in SAS, we begin by simulating the data. Note that we use the simple categorical variable simulator mentioned in the comments for example 7.20.


n = 10000
beta1 = .2
beta2 = log(1.25)
lambdaT = 20

x1 = rnorm(n,0)
x2 = sample(0:2,n,rep=TRUE,prob=c(1/3,1/3,1/3))
# true event time
T = rweibull(n, shape=1, scale=lambdaT*exp(-beta1*x1 - beta2*x2)) 
T[T>5] = 5 + (T[T>5] -5)/10
T[T>7] = 7 + (T[T>7] -7) * 20
event = T < rep(10,n)

Now we can fit the model using the coxph() function from the Survival package. There is a default method for plot()ing the "survfit" objects resulting from several package functions. However, it shows the typical survival plot. To show failure probability instead, we'll manually take the complement of the survival probability, but still take advantage of the default method. Note the use of the xmax option to limit the x-axis. The results (below) are somewhat bland, and it's unclear if the lines can be colored differently, or their widths increased. They are also as angular as the cumulative hazards shown in the SAS implementation.


library(survival)
plotph = coxph(Surv(T, event)~ x1 * strata(x2), 
     method="breslow")
summary(plotph)
sp = survfit(plotph)
sp$surv = 1 - sp$surv
plot(sp, xmax=10)

Consequently, as is so often the case, presentation graphics require more manual fiddling. We'll begin by extracting the data we need from the "survfit" object. We'll take just the failure times and probabilities, as well as the name of the strata to which each observation belongs, limiting the data to time < 10. The last of these lines uses the names() function to pull the names of the strata, repeating each an appropriate number of times with the useful rep() function.


sp = survfit(plotph)
failtimes = sp$time[sp$time <10]
failprobs = 1 - sp$surv[sp$time <10]
failcats = c(rep(names(sp$strata),times=sp$n))[sp$time <10]

All that remains is plotting the data, which is not dissimilar to many examples in the book and in this blog. There's likely some way to make these three lines with a little less typing, but knowing how to do it from scratch gives you the most flexibility. It proved difficult to get the desired amount of smoothness from the loess(), lowess(), or supsmu() functions, but smooth.spline() served admirably. The code below demonstrates increasing the axis and tick label sizes


plot(failprobs~failtimes, type="n", ylim=c(0,1), cex.lab=2, cex.axis= 1.5, 
     ylab= "Probability of infection", xlab = "Attributable time")
lines(smooth.spline(y=failprobs[failcats == "x2=0"], 
     x=failtimes[failcats == "x2=0"],all.knots=TRUE, spar=1.8), 
     col = "blue", lwd = 3)
lines(smooth.spline(y=failprobs[failcats == "x2=1"], 
     x=failtimes[failcats == "x2=1"],all.knots=TRUE, spar=1.8), 
     col = "red", lwd = 3)
lines(smooth.spline(y=failprobs[failcats == "x2=2"], 
     x=failtimes[failcats == "x2=2"],all.knots=TRUE, spar=1.8),
     col = "green", lwd = 3)
legend(x=7,y=0.4,legend=c("Arm 1", "Arm 2", "Arm 3"), 
     col = c("blue","red","green"), lty = c(1,1,1), lwd = c(3,3,3) )

Tuesday, September 27, 2011

Example 9.7: New stuff in SAS 9.3-- Frailty models

Shared frailty models are a way of allowing correlated observations into proportional hazards models. Briefly, instead of l_i(t) = l_0(t)e^(x_iB), we allow l_ij(t) = l_0(t)e^(x_ijB + g_i), where observations j are in clusters i, g_i is typically normal with mean 0, and g_i is uncorrelated with g_i'. The nomenclature frailty comes from examining the logs of the g_i and rewriting the model as l_ij(t) = l_0(t)u_i*e^(xB) where the u_i are now lognormal with median 0. Observations j within cluster i share the frailty u_i, and fail faster (are frailer) than average if u_i > 1.

In SAS 9.2, this model could not be fit, though it is included in the survival package in R. (Section 4.3.2) With SAS 9.3, it can now be fit. We explore here through simulation, extending the approach shown in example 7.30.

SAS
To include frailties in the model, we loop across the clusters to first generate the frailties, then insert the loop from example 7.30, which now represents the observations within cluster, adding the frailty to the survival time model. There's no need to adjust the censoring time.


data simfrail;
  beta1 = 2;
  beta2 = -1;
  lambdat = 0.002; *baseline hazard;
  lambdac = 0.004; *censoring hazard;
  do i = 1 to 250; *new frailty loop;
    frailty = normal(1999) * sqrt(.5);
    do j = 1 to 5; *original loop;
      x1 = normal(0);
      x2 = normal(0);
      * new model of event time, with frailty added;
      linpred = exp(-beta1*x1 - beta2*x2 + frailty);
      t = rand("WEIBULL", 1, lambdaT * linpred);
        * time of event;
      c = rand("WEIBULL", 1, lambdaC);
        * time of censoring;
      time = min(t, c);    * which came first?;
      censored = (c lt t);
    output;
 end;
  end;
run;

For comparison's sake, we replicate the naive model assuming independence:


proc phreg data=simfrail;
  model time*censored(1) = x1 x2;
run;

               Parameter   Standard                         Hazard
 Parameter DF   Estimate      Error Chi-Square Pr > ChiSq    Ratio

 x1         1    1.68211    0.05859   824.1463     <.0001    5.377
 x2         1   -0.88585    0.04388   407.4942     <.0001    0.412

The parameter estimates are rather biased. In contrast, here is the correct frailty model.


proc phreg data=simfrail;
  class i;
  model time*censored(1) = x1 x2;
  random i / noclprint;
run;
                    Cov         REML    Standard
                    Parm    Estimate       Error

                    i         0.5329     0.07995

               Parameter   Standard                         Hazard
 Parameter DF   Estimate      Error Chi-Square Pr > ChiSq    Ratio

 x1         1    2.03324    0.06965   852.2179     <.0001    7.639
 x2         1   -1.00966    0.05071   396.4935     <.0001    0.364

This returns estimates gratifyingly close to the truth. The syntax of the random statement is fairly straightforward-- the noclprint option prevents printing all the values of i. The clustering variable must be specified in the class statement. The output shows the estimated variance of the g_i.

R
In our book (section 4.16.14) we show an example of fitting the uncorrelated data model, but we don't display a frailty model. Here, we use the data generated in SAS, so we omit the data simulation in R. As in SAS, it would be a trivial extension of the code presented in example 7.30. For parallelism, we show the results of ignoring the correlation, first.


> library(survival)
> with(simfrail, coxph(formula = Surv(time, 1-censored) ~ x1 + x2))

     coef exp(coef) se(coef)     z p
x1  1.682     5.378   0.0586  28.7 0
x2 -0.886     0.412   0.0439 -20.2 0

with identical results to above. Note that the Surv function expects an indicator of the event, vs. SAS expecting a censoring indicator.

As with SAS, the syntax for incorporating the frailty is simple.


> with(simfrail, coxph(formula = Surv(time, 1-censored) ~ x1 + x2 
    + frailty(i)))

            coef se(coef) se2    Chisq DF  p
x1          2.02 0.0692   0.0662 850     1 0
x2         -1.00 0.0506   0.0484 393     1 0
frailty(i)                       332   141 0

Variance of random effect= 0.436

Here, the results differ slightly from the SAS model, but still return parameter estimates that are quite similar. We're not familiar enough with the computational methods to diagnose the differences.

Reviews (from the first edition)

"By placing the R and SAS solutions together and by covering a vast array of tasks in one book, Kleinman and Horton have added surprising value and searchability to the information in their book. … a home run, and it is a book I am grateful to have sitting, dust-free, on my shelf."
—Robert Alan Greevy, Jr, Teaching of Statistics in the Health Sciences

"I use SAS and R on a daily basis. Each has strengths and weaknesses, and using both of them gives the advantage of being able to do almost anything when it comes to data manipulation, analysis, and graphics. If you use both SAS and R on a regular basis, get this book. If you know one of the packages and are learning the other, you may need more than this book, but get this book, too. "

Charles Heckler, University of Rochester, Technometrics

"Excellent cross-referencing to other topics and end-of-chapter worked examples on the ‘Health evaluation and linkage to primary care’ data set are given with each topic. … users who are proficient in either of the software packages but with the need to use the other will find this book useful."
—Frances Denny, Journal of the Royal Statistical Society, Series A

About the authors

Nicholas Horton is a Professor of Statistics at Amherst College. He is a biostatistician with expertise in missing data methods, longitudinal regression, statistical computing and statistical education. Nick's home page; Nick's Google Scholar author page

Ken Kleinman is an Associate Professor with the Department of Biostatistics and Epidemiology at the University of Massachusetts, Amherst. He is a consulting biostatistician with expertise in group-randomized trials and disease surveillance; he also offers R training courses. Ken's home page; Ken's Google Scholar author page.

SAS and R

Catalogs of posts

Tuesday, March 13, 2012

Example 9.23: Demonstrating proportional hazards

Tuesday, September 27, 2011

Example 9.7: New stuff in SAS 9.3-- Frailty models

About SAS and R

Topics discussed