Showing posts with label survival model. Show all posts
Showing posts with label survival model. Show all posts

Monday, May 24, 2010

Example 7.38: Kaplan-Meier survival estimates

In example 7.30 we demonstrated how to simulate data from a Cox proportional hazards model.

In this and the next few entries, we expand upon support in R and SAS for survival (time-to-event) models. We'll start with a small, artificial dataset of 19 subjects. Each subject contributes a pair of variables: the time and an indicator of whether the time is when the event occurred (event=TRUE) or when the subject was censored (event=FALSE).

time  event
0.5   FALSE
1     TRUE
1     TRUE
2     TRUE
2     FALSE
3     TRUE
4     TRUE
5     FALSE
6     TRUE
7     FALSE
8     TRUE
9     TRUE
10    FALSE
12    TRUE
14    FALSE
14    TRUE
17    FALSE
20    TRUE
21    FALSE


Until an instant before time=1, no events were observed (only the censored observation), so the survival estimate is 1. At time=1, 2 subjects out of the 18 still at risk observed the event, so the survival function S(.) at time 1 is S(1) = 16/18 = 0.8889. The next failure occurs at time=2, with 16 still at risk, so S(2)=15/16 * 16/18 = 0.8333. Note that in addition to the event at time=2, there is a subject censored then, so the number at risk at time=3 is just 13 (so S(3) = 13/14 * 15*16 * 16/18 = 0.7738). The calculations continue until the final event is observed.


R

In R, we use the survfit() function (section 5.1.19) within the survival library to calculate the survival function across time.

library(survival)
time =  c(0.5, 1,1,2,2,3,4,5,6,7,8,9,10,12,14,14,17,20, 21)
event = c(FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, 
  TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, 
  TRUE, FALSE)
ds = data.frame(time, event)
fit = survfit(Surv(time, event) ~ 1, data=ds)


The returned survival object includes a number of attributes, such as the survival estimates at each timepoint, the standard error of those estimates, and the number of subjects at risk.

> names(fit)
 [1] "n"         "time"      "n.risk"    "n.event"   "n.censor"  
     "surv"      "type"      "std.err"  
 [9] "upper"     "lower"     "conf.type" "conf.int"  "call" 
> summary(fit)
Call: survfit(formula = Surv(time, event) ~ 1, data = ds)

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1     18       2    0.889  0.0741       0.7549        1.000
    2     16       1    0.833  0.0878       0.6778        1.000
    3     14       1    0.774  0.0997       0.6011        0.996
    4     13       1    0.714  0.1084       0.5306        0.962
    6     11       1    0.649  0.1164       0.4570        0.923
    8      9       1    0.577  0.1238       0.3791        0.879
    9      8       1    0.505  0.1276       0.3078        0.829
   12      6       1    0.421  0.1312       0.2285        0.775
   14      5       1    0.337  0.1292       0.1587        0.714
   20      2       1    0.168  0.1354       0.0348        0.815



SAS

We can read in the artificial data using an input statement (section 1.1.8).

data ds;
input time event;
cards;
0.5   0
1     1
1     1
2     1
2     0
3     1
4     1
5     0
6     1
7     0
8     1
9     1
10    0
12    1
14    0
14    1
17    0
20    1
21    0
run;




proc lifetest data=ds;
  time time*event(0);
run;


Here we denote censoring as being values where event is equal to 0. If we had a censoring indicator coded in reverse (1 = censoring), the second line might read time time*censored(1);.

The survival function can be estimated in proc lifetest (as shown in section 5.1.19). In a break from our usual practice, we'll include all of the output generated by proc lifetest.

The LIFETEST Procedure
                   Product-Limit Survival Estimates
                                     Survival
                                     Standard     Number      Number 
    time     Survival    Failure      Error       Failed       Left  
  0.0000       1.0000           0           0        0          19   
  0.5000*           .           .           .        0          18   
  1.0000            .           .           .        1          17   
  1.0000       0.8889      0.1111      0.0741        2          16   
  2.0000       0.8333      0.1667      0.0878        3          15   
  2.0000*           .           .           .        3          14   
  3.0000       0.7738      0.2262      0.0997        4          13   
  4.0000       0.7143      0.2857      0.1084        5          12   
  5.0000*           .           .           .        5          11   
  6.0000       0.6494      0.3506      0.1164        6          10   
  7.0000*           .           .           .        6           9   
  8.0000       0.5772      0.4228      0.1238        7           8   
  9.0000       0.5051      0.4949      0.1276        8           7   
 10.0000*           .           .           .        8           6   
 12.0000       0.4209      0.5791      0.1312        9           5   
 14.0000       0.3367      0.6633      0.1292       10           4   
 14.0000*           .           .           .       10           3   
 17.0000*           .           .           .       10           2   
 20.0000       0.1684      0.8316      0.1354       11           1   
 21.0000*           .           .           .       11           0   

NOTE: The marked survival times are censored observations.

Summary Statistics for Time Variable time

             Quartile Estimates
             Point     95% Confidence Interval
Percent    Estimate      [Lower      Upper)
     75     20.0000     12.0000       .    
     50     12.0000      6.0000     20.0000
     25      4.0000      1.0000     12.0000

    Mean    Standard Error
 11.1776            1.9241

NOTE: The mean survival time and its standard error were underestimated because 
      the largest observation was censored and the estimation was restricted to 
      the largest event time.

Summary of the Number of Censored and Uncensored Values
                                 Percent
   Total  Failed    Censored    Censored
      19      11           8       42.11

Tuesday, March 30, 2010

Example 7.30: Simulate censored survival data

To simulate survival data with censoring, we need to model the hazard functions for both time to event and time to censoring.

We simulate both event times from a Weibull distribution with a scale parameter of 1 (this is equivalent to an exponential random variable). The event time has a Weibull shape parameter of 0.002 times a linear predictor, while the censoring time has a Weibull shape parameter of 0.004. A scale of 1 implies a constant (exponential) baseline hazard, but this can be modified by specifying other scale parameters for the Weibull random variables.

First we'll simulate the data, then we'll fit a Cox proportional hazards regression model (section 4.3.1) to see the results.

Simulation is relatively straightforward, and is helpful in concretizing the notation often used in discussion survival data. After setting some parameters, we generate some covariate values, then simply draw an event time and a censoring time. The minimum of these is "observed" and we record whether it was the event time or the censoring time.

SAS


data simcox;
beta1 = 2;
beta2 = -1;
lambdat = 0.002; *baseline hazard;
lambdac = 0.004; *censoring hazard;
do i = 1 to 10000;
x1 = normal(0);
x2 = normal(0);
linpred = exp(-beta1*x1 - beta2*x2);
t = rand("WEIBULL", 1, lambdaT * linpred);
* time of event;
c = rand("WEIBULL", 1, lambdaC);
* time of censoring;
time = min(t, c); * which came first?;
censored = (c lt t);
output;
end;
run;


The phreg procedure (section 4.3.1) will show us the effects of the censoring as well as the results of fitting the regression model. We use the ODS system to reduce the output.


ods select censoredsummary parameterestimates;
proc phreg data=simcox;
model time*censored(1) = x1 x2;
run;

The PHREG Procedure

Summary of the Number of Event and Censored Values

Percent
Total Event Censored Censored
10000 5971 4029 40.29

Analysis of Maximum Likelihood Estimates

Parameter Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq

x1 1 1.98628 0.02213 8059.0716 <.0001
x2 1 -1.01310 0.01583 4098.0277 <.0001

Analysis of Maximum Likelihood Estimates

Hazard
Parameter Ratio

x1 7.288
x2 0.363





R


n = 10000
beta1 = 2; beta2 = -1
lambdaT = .002 # baseline hazard
lambdaC = .004 # hazard of censoring

x1 = rnorm(n,0)
x2 = rnorm(n,0)
# true event time
T = rweibull(n, shape=1, scale=lambdaT*exp(-beta1*x1-beta2*x2))
C = rweibull(n, shape=1, scale=lambdaC) #censoring time
time = pmin(T,C) #observed time is min of censored and true
event = time==T # set to 1 if event is observed


Having generated the data, we assess the effects of censoring with the table() function (section 2.2.1) and load the survival() library to fit the Cox model.


> table(event)
event
FALSE TRUE
4083 5917




> library(survival)
> coxph(Surv(time, event)~ x1 + x2, method="breslow")
Call:
coxph(formula = Surv(time, event) ~ x1 + x2, method = "breslow")


coef exp(coef) se(coef) z p
x1 1.98 7.236 0.0222 89.2 0
x2 -1.02 0.359 0.0160 -64.2 0

Likelihood ratio test=11369 on 2 df, p=0 n= 10000


These parameters result in data where approximately 40% of the observations are censored. The parameter estimates are similar to the true parameter values.