## Tuesday, March 30, 2010

### Example 7.30: Simulate censored survival data

To simulate survival data with censoring, we need to model the hazard functions for both time to event and time to censoring.

We simulate both event times from a Weibull distribution with a scale parameter of 1 (this is equivalent to an exponential random variable). The event time has a Weibull shape parameter of 0.002 times a linear predictor, while the censoring time has a Weibull shape parameter of 0.004. A scale of 1 implies a constant (exponential) baseline hazard, but this can be modified by specifying other scale parameters for the Weibull random variables.

First we'll simulate the data, then we'll fit a Cox proportional hazards regression model (section 4.3.1) to see the results.

Simulation is relatively straightforward, and is helpful in concretizing the notation often used in discussion survival data. After setting some parameters, we generate some covariate values, then simply draw an event time and a censoring time. The minimum of these is "observed" and we record whether it was the event time or the censoring time.

SAS

`data simcox;  beta1 = 2;  beta2 = -1;  lambdat = 0.002; *baseline hazard;  lambdac = 0.004; *censoring hazard;  do i = 1 to 10000;    x1 = normal(0);    x2 = normal(0);    linpred = exp(-beta1*x1 - beta2*x2);    t = rand("WEIBULL", 1, lambdaT * linpred);    * time of event;    c = rand("WEIBULL", 1, lambdaC);           * time of censoring;    time = min(t, c);    * which came first?;    censored = (c lt t);    output;  end;run;`

The phreg procedure (section 4.3.1) will show us the effects of the censoring as well as the results of fitting the regression model. We use the ODS system to reduce the output.

`ods select censoredsummary parameterestimates;proc phreg data=simcox;  model time*censored(1) = x1 x2;run;The PHREG ProcedureSummary of the Number of Event and Censored Values                                      Percent   Total       Event    Censored    Censored   10000        5971        4029       40.29          Analysis of Maximum Likelihood Estimates                 Parameter    StandardParameter  DF    Estimate       Error  Chi-Square  Pr > ChiSqx1          1     1.98628     0.02213   8059.0716      <.0001x2          1    -1.01310     0.01583   4098.0277      <.0001Analysis of Maximum Likelihood Estimates              HazardParameter     Ratiox1            7.288x2            0.363`

R

`n = 10000beta1 = 2; beta2 = -1lambdaT = .002 # baseline hazardlambdaC = .004  # hazard of censoringx1 = rnorm(n,0)x2 = rnorm(n,0)# true event timeT = rweibull(n, shape=1, scale=lambdaT*exp(-beta1*x1-beta2*x2)) C = rweibull(n, shape=1, scale=lambdaC)   #censoring timetime = pmin(T,C)  #observed time is min of censored and trueevent = time==T   # set to 1 if event is observed`

Having generated the data, we assess the effects of censoring with the table() function (section 2.2.1) and load the survival() library to fit the Cox model.

`> table(event)eventFALSE  TRUE  4083  5917 `

`> library(survival)> coxph(Surv(time, event)~ x1 + x2, method="breslow")Call:coxph(formula = Surv(time, event) ~ x1 + x2, method = "breslow")    coef exp(coef) se(coef)     z px1  1.98     7.236   0.0222  89.2 0x2 -1.02     0.359   0.0160 -64.2 0Likelihood ratio test=11369  on 2 df, p=0  n= 10000 `

These parameters result in data where approximately 40% of the observations are censored. The parameter estimates are similar to the true parameter values.

Anonymous said...

proc lifereg data =simcox;
model t*censored(1) =x1 x2/dist =weibull;
run;

I find that the estiamted beta1 and beta2 are not mathced to those in simulation. i think i missed something. do you have any comments?

Ken Kleinman said...

I see two potential problems. First, you have "t" in the model statement, rather than "time." This is the uncensored version. You need either

model t =x1 x2 / dist =weibull;

or

model time*censored(1) =x1 x2 / dist =weibull;

otherwise the model won't fit very well.

The other issue is that the models are interpreted very differently. The parameter estimate from PHREG is the log relative hazard associated with a one-unit increase in the covariate, while the estimate from LIFEREG is the log relative change in time to event associated with the same one-unit increase.

This means that when we see a positive est estimate in PHREG (increased hazard of event) we should expect a _negative_ estimate in LIFEREG (sooner occurrence of event).

Anonymous said...

Cool. I got it. Maybe the following question is more basic but sort of confused to me:

linpred = exp(-beta1*x1 - beta2*x2);

I don’t see intercept and measurement error here. On the other hand, in the lifereg model, we indeed have an intercept. Can you illustrate what the estimated intercept stand for?

Thanks.

Ken Kleinman said...

The intercept ought to reflect here exactly what it does in any regression: the expected outcome when the covariates are all set to 0. There's no measurement error, but there's Weibull variability injected directly.

Szilard said...

Could you think a similar simulation scenario for Aalen Additive Hazards model?

I did try everything i could come up,but so far got nowhere.

Anonymous said...

I used the same simulation program, but I got different figures for the event, beta1 and beta2, is it normal to have different figures when the same simulation program is repeated a number of times?

Ken Kleinman said...

Very typical. Unless you set the seed for the pseudo-random number generator, you get a different stream of numbers every time, and the parameter estimates differ. This is one of the great strengths of simulation.

Wen said...

Would you please advise how to check if the estimates for scale and shape are correct? I used the code to generate a weibull distribution with shape=2 and scale=0.0008*(-beta1*x1-beta2*x2). And using the generated dataset to run a survreg model. The transformed shape and beta's are correct, but I have no idea how to check "scale"? What the scale estimate we should expect from the regression model?Is it the original lamdaT or scale=0.0008*(-beta1*x1-beta2*x2)?

Kennedy said...
This comment has been removed by the author.
Kennedy said...

Dear Ken. Your blog is helpful. In your frailty model simulation, time is generated by Weibull i.e t = rand("WEIBULL", 1, lambdaT * linpred); Would you advise on how one can simulate this time so that its roughly say every 3 months i.e baseline, month 3, month 6 etc. Kennedy