Monday, June 21, 2010

Example 7.42: Testing the proportionality assumption

In addition to the non-parametric tools discussed in recent entries, it's common to use proportional hazards regression, (section 4.3.1) also called Cox regression, in evaluating survival data.

It's important in such models to test the proportionality assumption. Below, we demonstrate doing this for a simple model from the HELP data, available at the book web site. Our results below draw on this web page, part of the excellent resources provided by the UCLA ATS site.


First, we run a proportional hazards regression to assess the effects of treatment on the time to linkage with primary care. (Data were read in and observations with missing values removed in example 7.40.) Only a portion of the results are shown.

proc phreg data=h2;
model dayslink*linkstatus(0)=treat;
output out= propcheck ressch = schres;

Parameter Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq

treat 1 1.59103 0.19142 69.0837 <.0001

Being in the intervention group would appear to increase the probability of being linked to primary care. But do the model assumptions hold? The output statement above makes a new data set that contains the Schoenfeld residuals. (Schoenfeld D. Residuals for the proportional hazards regresssion model. Biometrika, 1982, 69(1):239-241.) One assessment of proportional hazards is based on these residuals, which ought to show no association with time if proportionality holds. We can plot them against the time to linkage using proc sgplot (section 5.1.1) and adding a loess curve to assess the relationship.

proc sgplot data=propcheck;
loess x = dayslink y = schres / clm;

From the resulting plot, shown above, there is an indication of a possible problem. One way to assess this is to include a time-varying covariate, an interaction between the suspect predictor(s) and the event time. This can be done easily within proc phreg. If the interaction term is significant, the null hypothesis of proportionality has been rejected.

proc phreg data=h2;
model dayslink*linkstatus(0)=treat treat_time;
treat_time = treat*log(dayslink);

Parameter Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq

treat 1 4.13105 0.97873 17.8153 <.0001
treat_time 1 -0.61846 0.22569 7.5093 0.0061

The proportional hazards assumption doesn't hold in this case.


In R, the time-varying covariate approach is harder to implement. However, the survival library includes a formal test based on the Schoenfeld residuals. (See Therneau and Grambsch, 2000, pages 127-142.) This is accessed by applying the cox.zph() function to the output of the coxph() function. The smallds object used below was created in the previous example. (Some output omitted.)

> library(survival)
> propcheck = coxph(Surv(dayslink, linkstatus) ~ treat, method="breslow")

> summary(propcheck)

coef exp(coef) se(coef) z Pr(>|z|)
treat 1.5910 4.9088 0.1914 8.312 <2e-16 ***

> cox.zph(propcheck)
rho chisq p
treat -0.233 8.47 0.00361

This test agrees with the SAS approach that the assumptions that proportionality holds can be rejected. To understand why, you can plot() the test. The transform='identity" option is used to make results more similar to those obtained with SAS.

plot(cox.zph(propcheck, transform='identity'))


Anonymous said...

Great post! one question though, isn't it supposed to be weighted Schoenfeld residuals, i.e. "wtressch = schres" instead of "ressch=schres" in the output statement?

Anonymous said...


Ken Kleinman said...

The assess statement in proc phreg can also be used for assessing the proportionality assumption.

stat4444 said...

Is there also a possibility to test the proportionality assumption for a discrete-time hazard model?

Anonymous said...

I don't think that "In R the time-varying covariate approach is harder to implement". I believe you just add an interaction term with time, just as you did in SAS.

Ken Kleinman said...

The thing about that interaction is that (IIUC) it needs to be calculated at every censoring time and every failure time in the data set. The SAS code calculates the interactions within the PHREG procedure to do this. If you just calculate a static one-time variable, you'll get a different and wrong answer.

I'm 100% sure it's possible to do this analysis in R-- it's just that for the purposes of assessing proportionality, the approach we show here is easier to implement.

Vishal Bali said...

Thanks for the post. It is really helpful. I performed survival analysis and KM curve cross each other suggesting that PH assumption is not met but my schenfeld test is not significant. So, could you please tell me what can be the reason that Schoenfeld test is not significant even though KM curves are crossing?

Ken Kleinman said...

Without seeing the data, my guess would be a lack of power. If the sample size (or number of events) is small, I'd be pretty concerned about the proportionality assumption.

Vishal Bali said...
This comment has been removed by the author.
Vishal Bali said...

Thanks Ken for your reply! I am looking at antidepressant use and risk of dementia in a propensity score matched sample. There are 732 patients in each group (Parox and other).

The percentage of censoring is 92.3% (677/732) in Parox group and 92.4% (676/732) in other group. This is a propensity score matched sample. I only controlled for osteoporosis in the adjusted analysis as it was significant even after matching. I only tested for Schoenfeld residuals using supermum test using 1000 replications(p=0.0650). I checked for KM curves using log rank test (p=0.635).

I also feel that it is a sample size issue as 55/732 had dementia in the Parox group and 56/732 got dementia in the other group in the matched sample.

Vishal Bali said...

Does anyone know how to create a heaviside function in SAS when the predictor variable has 3 categories?

Awa said...

Hello! This is so helpful! So I know that my data fails the proportionality assumption, and it fails around day 90 when the survival curves cross on a Kaplan-Meier graph. I want to fit 2 models, one early and one late-- but how do I censor the time intervals without excluding patients?

Dana Cullen said...
This comment has been removed by the author.
Anonymous said...


Thanks for this post, but what would you do if you have age as the time scale? I have data set up as a single individual per row with the model statment written as:
(age_in, age_out)*no_deaths (0)=drug_type

How do you assess proportional hazards in this case? I can't put in (age_out-age_in) as the x in the sgplot command...


March 23, 2017