SAS and R: Example 7.30: Simulate censored survival data

Tuesday, March 30, 2010

Example 7.30: Simulate censored survival data

To simulate survival data with censoring, we need to model the hazard functions for both time to event and time to censoring.

We simulate both event times from a Weibull distribution with a scale parameter of 1 (this is equivalent to an exponential random variable). The event time has a Weibull shape parameter of 0.002 times a linear predictor, while the censoring time has a Weibull shape parameter of 0.004. A scale of 1 implies a constant (exponential) baseline hazard, but this can be modified by specifying other scale parameters for the Weibull random variables.

First we'll simulate the data, then we'll fit a Cox proportional hazards regression model (section 4.3.1) to see the results.

Simulation is relatively straightforward, and is helpful in concretizing the notation often used in discussion survival data. After setting some parameters, we generate some covariate values, then simply draw an event time and a censoring time. The minimum of these is "observed" and we record whether it was the event time or the censoring time.

SAS


data simcox;
  beta1 = 2;
  beta2 = -1;
  lambdat = 0.002; *baseline hazard;
  lambdac = 0.004; *censoring hazard;
  do i = 1 to 10000;
    x1 = normal(0);
    x2 = normal(0);
    linpred = exp(-beta1*x1 - beta2*x2);
    t = rand("WEIBULL", 1, lambdaT * linpred);
    * time of event;
    c = rand("WEIBULL", 1, lambdaC);
           * time of censoring;
    time = min(t, c);    * which came first?;
    censored = (c lt t);
    output;
  end;
run;

The phreg procedure (section 4.3.1) will show us the effects of the censoring as well as the results of fitting the regression model. We use the ODS system to reduce the output.


ods select censoredsummary parameterestimates;
proc phreg data=simcox;
  model time*censored(1) = x1 x2;
run;

The PHREG Procedure

Summary of the Number of Event and Censored Values
 
                                     Percent
   Total       Event    Censored    Censored
   10000        5971        4029       40.29

          Analysis of Maximum Likelihood Estimates
 
                Parameter    Standard
Parameter  DF    Estimate       Error  Chi-Square  Pr > ChiSq

x1          1     1.98628     0.02213   8059.0716      <.0001
x2          1    -1.01310     0.01583   4098.0277      <.0001

Analysis of Maximum Likelihood Estimates
 
             Hazard
Parameter     Ratio

x1            7.288
x2            0.363


n = 10000
beta1 = 2; beta2 = -1
lambdaT = .002 # baseline hazard
lambdaC = .004  # hazard of censoring

x1 = rnorm(n,0)
x2 = rnorm(n,0)
# true event time
T = rweibull(n, shape=1, scale=lambdaT*exp(-beta1*x1-beta2*x2)) 
C = rweibull(n, shape=1, scale=lambdaC)   #censoring time
time = pmin(T,C)  #observed time is min of censored and true
event = time==T   # set to 1 if event is observed

Having generated the data, we assess the effects of censoring with the table() function (section 2.2.1) and load the survival() library to fit the Cox model.


> table(event)
event
FALSE  TRUE 
 4083  5917


> library(survival)
> coxph(Surv(time, event)~ x1 + x2, method="breslow")
Call:
coxph(formula = Surv(time, event) ~ x1 + x2, method = "breslow")


    coef exp(coef) se(coef)     z p
x1  1.98     7.236   0.0222  89.2 0
x2 -1.02     0.359   0.0160 -64.2 0

Likelihood ratio test=11369  on 2 df, p=0  n= 10000

These parameters result in data where approximately 40% of the observations are censored. The parameter estimates are similar to the true parameter values.

14 comments:

Anonymous said...: proc lifereg data =simcox;
model t*censored(1) =x1 x2/dist =weibull;
run;

I find that the estiamted beta1 and beta2 are not mathced to those in simulation. i think i missed something. do you have any comments?; March 4, 2011 at 10:16 AM
Ken Kleinman said...: I see two potential problems. First, you have "t" in the model statement, rather than "time." This is the uncensored version. You need either

model t =x1 x2 / dist =weibull;

or

model time*censored(1) =x1 x2 / dist =weibull;

otherwise the model won't fit very well.

The other issue is that the models are interpreted very differently. The parameter estimate from PHREG is the log relative hazard associated with a one-unit increase in the covariate, while the estimate from LIFEREG is the log relative change in time to event associated with the same one-unit increase.

This means that when we see a positive est estimate in PHREG (increased hazard of event) we should expect a _negative_ estimate in LIFEREG (sooner occurrence of event).; March 4, 2011 at 11:03 AM
Anonymous said...: Cool. I got it. Maybe the following question is more basic but sort of confused to me:

linpred = exp(-beta1*x1 - beta2*x2);

I don’t see intercept and measurement error here. On the other hand, in the lifereg model, we indeed have an intercept. Can you illustrate what the estimated intercept stand for?

Thanks.; March 4, 2011 at 12:26 PM
Ken Kleinman said...: The intercept ought to reflect here exactly what it does in any regression: the expected outcome when the covariates are all set to 0. There's no measurement error, but there's Weibull variability injected directly.; March 11, 2011 at 10:22 AM
Szilard said...: Could you think a similar simulation scenario for Aalen Additive Hazards model?

I did try everything i could come up,but so far got nowhere.; September 7, 2011 at 11:23 AM
Anonymous said...: I used the same simulation program, but I got different figures for the event, beta1 and beta2, is it normal to have different figures when the same simulation program is repeated a number of times?; December 16, 2011 at 7:05 AM
Ken Kleinman said...: Very typical. Unless you set the seed for the pseudo-random number generator, you get a different stream of numbers every time, and the parameter estimates differ. This is one of the great strengths of simulation.; December 16, 2011 at 10:42 AM
Wen said...: Would you please advise how to check if the estimates for scale and shape are correct? I used the code to generate a weibull distribution with shape=2 and scale=0.0008*(-beta1*x1-beta2*x2). And using the generated dataset to run a survreg model. The transformed shape and beta's are correct, but I have no idea how to check "scale"? What the scale estimate we should expect from the regression model?Is it the original lamdaT or scale=0.0008*(-beta1*x1-beta2*x2)?; February 19, 2012 at 12:38 PM
Kennedy said...: This comment has been removed by the author.; May 21, 2015 at 5:17 AM
Kennedy said...: Dear Ken. Your blog is helpful. In your frailty model simulation, time is generated by Weibull i.e t = rand("WEIBULL", 1, lambdaT * linpred); Would you advise on how one can simulate this time so that its roughly say every 3 months i.e baseline, month 3, month 6 etc. Kennedy; May 21, 2015 at 5:18 AM
Arpan said...: How to simulate Cure Rate Models in R?; January 29, 2017 at 3:07 PM
Michal Figurski said...: This is exactly what I was looking for - a simple and straightforward solution, not using some "black box" package, just plain weibull. Thank you for this post!; June 15, 2018 at 8:07 AM
Anonymous said...: Why are there negative signs before the beta for linpred? I have 4 independent variables and betas, will the equation be
linpred = exp (-beta1*x1+beta2*x2+beta3*x3+beta4*x4+beta5*x5);
where beta1 is the intercept.; February 17, 2019 at 10:29 PM
Nick Horton said...: The negative signs are there because the correct function is exp^{-XBeta). So you should carry the negative sign throughout your linear terms (beta2*x2, etc.); February 18, 2019 at 6:25 AM

SAS and R

Catalogs of posts

Tuesday, March 30, 2010

Example 7.30: Simulate censored survival data

14 comments:

About SAS and R

Topics discussed