SAS and R: failure time analysis

Monday, June 14, 2010

Example 7.41: hazard function plotting

As we continue with our series on survival analysis, we demonstrate how to plot estimated (smoothed) hazard functions.

R

We will utilize the routines available in the muhaz package. Background information on the methods can be found in K.R. Hess, D.M. Serachitopol and B.W. Brown Hazard Function Estimators: A Simulation Study, Statistics in Medicine, 1999: 18(22):3075-3088.


ds = read.csv("http://www.math.smith.edu/sasr/datasets/help.csv")
smallds = data.frame(dayslink=ds$dayslink, 
   linkstatus=ds$linkstatus, treat=ds$treat)

# drop subjects with missing data
smallds = na.omit(smallds)

treatds = smallds[smallds$treat==1,]
controlds = smallds[smallds$treat==0,]
rm(ds, smallds)  # clean up

library(muhaz)
haztreat = with(treatds, muhaz(dayslink, linkstatus))
hazcontrol = with(controlds, muhaz(dayslink, linkstatus))

plot(haztreat, lwd=2, xlab="Follow-up time (days)")
lines(hazcontrol, lty=2, lwd=2)
legend(200, 0.005, legend=c("Treatment", "Control"), 
   lty=1:2, lwd=2)

The treatment group has dramatically higher hazard, but this drops appreciably after 6 months. The control group hazard is low, and decreases in a roughly linear fashion.

SAS

Paul Alison includes macros to display estimates from parametric and semiparametric models in Survival Analysis Using SAS (2nd edition). We'll use the smooth macro, which is built to accept output from proc lifetest.


proc import file="c:\book\help.csv"
  out=help dbms=dlm;
  delimiter=',';
  getnames=yes;
run;

data h2;
set help;
  if nmiss(dayslink, linkstatus, treat) eq 0;
run;

proc lifetest data=h2 outsurv=allison;
  time dayslink*linkstatus(0);
  strata treat;
run;

%include "c:/ken/sasmacros/smooth.sas";
%smooth(data=allison, time=dayslink, width=25);

The proc lifetest results (not shown) indicate that group 1 is the control and group 2 is the intervention. The macro uses a simpler smoothing method than that found in R, so that the curve is bumpier and estimates near the edges are not shown.

Monday, May 24, 2010

Example 7.38: Kaplan-Meier survival estimates

In example 7.30 we demonstrated how to simulate data from a Cox proportional hazards model.

In this and the next few entries, we expand upon support in R and SAS for survival (time-to-event) models. We'll start with a small, artificial dataset of 19 subjects. Each subject contributes a pair of variables: the time and an indicator of whether the time is when the event occurred (event=TRUE) or when the subject was censored (event=FALSE).

time  event
0.5   FALSE
1     TRUE
1     TRUE
2     TRUE
2     FALSE
3     TRUE
4     TRUE
5     FALSE
6     TRUE
7     FALSE
8     TRUE
9     TRUE
10    FALSE
12    TRUE
14    FALSE
14    TRUE
17    FALSE
20    TRUE
21    FALSE

Until an instant before time=1, no events were observed (only the censored observation), so the survival estimate is 1. At time=1, 2 subjects out of the 18 still at risk observed the event, so the survival function S(.) at time 1 is S(1) = 16/18 = 0.8889. The next failure occurs at time=2, with 16 still at risk, so S(2)=15/16 * 16/18 = 0.8333. Note that in addition to the event at time=2, there is a subject censored then, so the number at risk at time=3 is just 13 (so S(3) = 13/14 * 15*16 * 16/18 = 0.7738). The calculations continue until the final event is observed.

R

In R, we use the survfit() function (section 5.1.19) within the survival library to calculate the survival function across time.

library(survival)
time =  c(0.5, 1,1,2,2,3,4,5,6,7,8,9,10,12,14,14,17,20, 21)
event = c(FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, 
  TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, 
  TRUE, FALSE)

ds = data.frame(time, event)
fit = survfit(Surv(time, event) ~ 1, data=ds)

The returned survival object includes a number of attributes, such as the survival estimates at each timepoint, the standard error of those estimates, and the number of subjects at risk.

> names(fit)
 [1] "n"         "time"      "n.risk"    "n.event"   "n.censor"  
     "surv"      "type"      "std.err"  
 [9] "upper"     "lower"     "conf.type" "conf.int"  "call" 
> summary(fit)
Call: survfit(formula = Surv(time, event) ~ 1, data = ds)

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1     18       2    0.889  0.0741       0.7549        1.000
    2     16       1    0.833  0.0878       0.6778        1.000
    3     14       1    0.774  0.0997       0.6011        0.996
    4     13       1    0.714  0.1084       0.5306        0.962
    6     11       1    0.649  0.1164       0.4570        0.923
    8      9       1    0.577  0.1238       0.3791        0.879
    9      8       1    0.505  0.1276       0.3078        0.829
   12      6       1    0.421  0.1312       0.2285        0.775
   14      5       1    0.337  0.1292       0.1587        0.714
   20      2       1    0.168  0.1354       0.0348        0.815

SAS

We can read in the artificial data using an input statement (section 1.1.8).

data ds;
input time event;
cards;
0.5   0
1     1
1     1
2     1
2     0
3     1
4     1
5     0
6     1
7     0
8     1
9     1
10    0
12    1
14    0
14    1
17    0
20    1
21    0
run;

proc lifetest data=ds;
  time time*event(0);
run;

Here we denote censoring as being values where event is equal to 0. If we had a censoring indicator coded in reverse (1 = censoring), the second line might read time time*censored(1);.

The survival function can be estimated in proc lifetest (as shown in section 5.1.19). In a break from our usual practice, we'll include all of the output generated by proc lifetest.

The LIFETEST Procedure
                   Product-Limit Survival Estimates
                                     Survival
                                     Standard     Number      Number 
    time     Survival    Failure      Error       Failed       Left  
  0.0000       1.0000           0           0        0          19   
  0.5000*           .           .           .        0          18   
  1.0000            .           .           .        1          17   
  1.0000       0.8889      0.1111      0.0741        2          16   
  2.0000       0.8333      0.1667      0.0878        3          15   
  2.0000*           .           .           .        3          14   
  3.0000       0.7738      0.2262      0.0997        4          13   
  4.0000       0.7143      0.2857      0.1084        5          12   
  5.0000*           .           .           .        5          11   
  6.0000       0.6494      0.3506      0.1164        6          10   
  7.0000*           .           .           .        6           9   
  8.0000       0.5772      0.4228      0.1238        7           8   
  9.0000       0.5051      0.4949      0.1276        8           7   
 10.0000*           .           .           .        8           6   
 12.0000       0.4209      0.5791      0.1312        9           5   
 14.0000       0.3367      0.6633      0.1292       10           4   
 14.0000*           .           .           .       10           3   
 17.0000*           .           .           .       10           2   
 20.0000       0.1684      0.8316      0.1354       11           1   
 21.0000*           .           .           .       11           0   

NOTE: The marked survival times are censored observations.

Summary Statistics for Time Variable time

             Quartile Estimates
             Point     95% Confidence Interval
Percent    Estimate      [Lower      Upper)
     75     20.0000     12.0000       .    
     50     12.0000      6.0000     20.0000
     25      4.0000      1.0000     12.0000

    Mean    Standard Error
 11.1776            1.9241

NOTE: The mean survival time and its standard error were underestimated because 
      the largest observation was censored and the estimation was restricted to 
      the largest event time.

Summary of the Number of Censored and Uncensored Values
                                 Percent
   Total  Failed    Censored    Censored
      19      11           8       42.11

Reviews (from the first edition)

"By placing the R and SAS solutions together and by covering a vast array of tasks in one book, Kleinman and Horton have added surprising value and searchability to the information in their book. … a home run, and it is a book I am grateful to have sitting, dust-free, on my shelf."
—Robert Alan Greevy, Jr, Teaching of Statistics in the Health Sciences

"I use SAS and R on a daily basis. Each has strengths and weaknesses, and using both of them gives the advantage of being able to do almost anything when it comes to data manipulation, analysis, and graphics. If you use both SAS and R on a regular basis, get this book. If you know one of the packages and are learning the other, you may need more than this book, but get this book, too. "

Charles Heckler, University of Rochester, Technometrics

"Excellent cross-referencing to other topics and end-of-chapter worked examples on the ‘Health evaluation and linkage to primary care’ data set are given with each topic. … users who are proficient in either of the software packages but with the need to use the other will find this book useful."
—Frances Denny, Journal of the Royal Statistical Society, Series A

About the authors

Nicholas Horton is a Professor of Statistics at Amherst College. He is a biostatistician with expertise in missing data methods, longitudinal regression, statistical computing and statistical education. Nick's home page; Nick's Google Scholar author page

Ken Kleinman is an Associate Professor with the Department of Biostatistics and Epidemiology at the University of Massachusetts, Amherst. He is a consulting biostatistician with expertise in group-randomized trials and disease surveillance; he also offers R training courses. Ken's home page; Ken's Google Scholar author page.

SAS and R

Catalogs of posts

Monday, June 14, 2010

Example 7.41: hazard function plotting

Monday, May 24, 2010

Example 7.38: Kaplan-Meier survival estimates

About SAS and R

Topics discussed