Monday, June 28, 2010

Second year of entries!

Hello, readers new and old!

We started adding examples a year ago, in advance of the book's publication.

To mark the occasion, we're closing chapter 7 and starting chapter 8 next week. We've crafted a listing of all entries from the first year and made this available here.

For those wanting to keep score at home, the five entries most often viewed were:


Example 7.11: Plot an empirical cumulative distribution function from scratch

Example 7.8: Plot two empirical cumulative density functions using available tools

Example 7.2: Simulate data from a logistic regression

Example 7.34: Propensity scores and causal inference from observational studies

Example 7.30: Simulate censored survival data


Thanks for reading! We love comments, questions, and suggestions for new examples.

Ken and Nick

Monday, June 21, 2010

Example 7.42: Testing the proportionality assumption



In addition to the non-parametric tools discussed in recent entries, it's common to use proportional hazards regression, (section 4.3.1) also called Cox regression, in evaluating survival data.

It's important in such models to test the proportionality assumption. Below, we demonstrate doing this for a simple model from the HELP data, available at the book web site. Our results below draw on this web page, part of the excellent resources provided by the UCLA ATS site.

SAS

First, we run a proportional hazards regression to assess the effects of treatment on the time to linkage with primary care. (Data were read in and observations with missing values removed in example 7.40.) Only a portion of the results are shown.


proc phreg data=h2;
model dayslink*linkstatus(0)=treat;
output out= propcheck ressch = schres;
run;

Parameter Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq

treat 1 1.59103 0.19142 69.0837 <.0001


Being in the intervention group would appear to increase the probability of being linked to primary care. But do the model assumptions hold? The output statement above makes a new data set that contains the Schoenfeld residuals. (Schoenfeld D. Residuals for the proportional hazards regresssion model. Biometrika, 1982, 69(1):239-241.) One assessment of proportional hazards is based on these residuals, which ought to show no association with time if proportionality holds. We can plot them against the time to linkage using proc sgplot (section 5.1.1) and adding a loess curve to assess the relationship.


proc sgplot data=propcheck;
loess x = dayslink y = schres / clm;
run;


From the resulting plot, shown above, there is an indication of a possible problem. One way to assess this is to include a time-varying covariate, an interaction between the suspect predictor(s) and the event time. This can be done easily within proc phreg. If the interaction term is significant, the null hypothesis of proportionality has been rejected.


proc phreg data=h2;
model dayslink*linkstatus(0)=treat treat_time;
treat_time = treat*log(dayslink);
run;

Parameter Standard
Parameter DF Estimate Error Chi-Square Pr > ChiSq

treat 1 4.13105 0.97873 17.8153 <.0001
treat_time 1 -0.61846 0.22569 7.5093 0.0061


The proportional hazards assumption doesn't hold in this case.

R

In R, the time-varying covariate approach is harder to implement. However, the survival library includes a formal test based on the Schoenfeld residuals. (See Therneau and Grambsch, 2000, pages 127-142.) This is accessed by applying the cox.zph() function to the output of the coxph() function. The smallds object used below was created in the previous example. (Some output omitted.)


> library(survival)
> propcheck = coxph(Surv(dayslink, linkstatus) ~ treat, method="breslow")

> summary(propcheck)

coef exp(coef) se(coef) z Pr(>|z|)
treat 1.5910 4.9088 0.1914 8.312 <2e-16 ***

> cox.zph(propcheck)
rho chisq p
treat -0.233 8.47 0.00361


This test agrees with the SAS approach that the assumptions that proportionality holds can be rejected. To understand why, you can plot() the test. The transform='identity" option is used to make results more similar to those obtained with SAS.


plot(cox.zph(propcheck, transform='identity'))

Monday, June 14, 2010

Example 7.41: hazard function plotting



As we continue with our series on survival analysis, we demonstrate how to plot estimated (smoothed) hazard functions.

R

We will utilize the routines available in the muhaz package. Background information on the methods can be found in K.R. Hess, D.M. Serachitopol and B.W. Brown Hazard Function Estimators: A Simulation Study, Statistics in Medicine, 1999: 18(22):3075-3088.

ds = read.csv("http://www.math.smith.edu/sasr/datasets/help.csv")
smallds = data.frame(dayslink=ds$dayslink,
linkstatus=ds$linkstatus, treat=ds$treat)

# drop subjects with missing data
smallds = na.omit(smallds)

treatds = smallds[smallds$treat==1,]
controlds = smallds[smallds$treat==0,]
rm(ds, smallds) # clean up

library(muhaz)
haztreat = with(treatds, muhaz(dayslink, linkstatus))
hazcontrol = with(controlds, muhaz(dayslink, linkstatus))

plot(haztreat, lwd=2, xlab="Follow-up time (days)")
lines(hazcontrol, lty=2, lwd=2)
legend(200, 0.005, legend=c("Treatment", "Control"),
lty=1:2, lwd=2)

The treatment group has dramatically higher hazard, but this drops appreciably after 6 months. The control group hazard is low, and decreases in a roughly linear fashion.

SAS

Paul Alison includes macros to display estimates from parametric and semiparametric models in Survival Analysis Using SAS (2nd edition). We'll use the smooth macro, which is built to accept output from proc lifetest.

proc import file="c:\book\help.csv"
out=help dbms=dlm;
delimiter=',';
getnames=yes;
run;

data h2;
set help;
if nmiss(dayslink, linkstatus, treat) eq 0;
run;

proc lifetest data=h2 outsurv=allison;
time dayslink*linkstatus(0);
strata treat;
run;

%include "c:/ken/sasmacros/smooth.sas";
%smooth(data=allison, time=dayslink, width=25);

The proc lifetest results (not shown) indicate that group 1 is the control and group 2 is the intervention. The macro uses a simpler smoothing method than that found in R, so that the curve is bumpier and estimates near the edges are not shown.

Monday, June 7, 2010

Example 7.40: Nelson-Aalen plotting




In our previous entry, we described how to calculate the Nelson-Aalen estimate of cumulative hazard.

In this entry, we display the estimates for the time to linkage to primary care for both the treatment and control groups in the HELP study.

R

We use the previously defined function, after removing missing values and sorting by the time to event or censoring.


ds = read.csv("http://www.math.smith.edu/sasr/datasets/help.csv")
smallds = data.frame(dayslink=ds$dayslink,
linkstatus=ds$linkstatus, treat=ds$treat)

# drop subjects with missing data
smallds = na.omit(smallds)

# order by dayslink
smallds = smallds[order(smallds$dayslink),]

rm(ds) # clean up
attach(smallds)
library(survival)
calcna = function(time, event) {
na.fit = survfit(coxph(Surv(time,event)~1),
type="aalen")
jumps = c(0, na.fit$time, max(time))
# need to be careful at the beginning and end
surv = c(1, na.fit$surv, na.fit$surv[length(na.fit$surv)])

# apply appropriate transformation
neglogsurv = -log(surv)

# create placeholder of correct length
naest = numeric(length(time))
for (i in 2:length(jumps)) {
naest[which(time>=jumps[i-1] & time<=jumps[i])] =
neglogsurv[i-1] # snag the appropriate value
}
return(naest)
}

nacontrol = calcna(dayslink[treat==0], linkstatus[treat==0])
natreat = calcna(dayslink[treat==1], linkstatus[treat==1])

plot(dayslink[treat==1], natreat, type="s", col="blue",
ylab="Nelson-Aalen estimate",
xlab="Number of days", lwd=2)
lines(dayslink[treat==0], nacontrol, lty=2, lwd=2,
col="red", type="s")
legend(250, 0.55, legend=c("Intervention", "Control"),
lty=1:2, lwd=2, col=c("blue", "red"))


The time to linkage was much lower for the treatment group than for the control group.

SAS

In SAS we'll use proc lifetest with the strata statement, as in section 5.6.3, removing subjects with missing time, censoring, or treatment indicators using the nmiss function (section 1.4.14) and subsetting if statement (section 1.5.1). We supress all printed output and save the desired data set using ODS commands.


proc import file="c:\book\help.csv"
out=help dbms=dlm;
delimiter=',';
getnames=yes;
run;

data h2;
set help;
if nmiss(dayslink, linkstatus,treat) eq 0;
run;

ods select none;
ods output productlimitestimates=naout1;
proc lifetest data=h2 nelson;
time dayslink*linkstatus(0);
strata treat;
run;
ods select all;


Then we just have to plot the data. We make it a little more self-explanatory by defining a format for the treatment group. Separate curves are requested with the y*x=z syntax, which also produces a legend by default. We use the symbol statement with the stepj interpolation (section 5.1.19) to ask for a step function plot of each curve. The axis statement is used to rotate the title of the y-axis. The label for the y-axis is inherited from proc lifetest.


proc format;
value treat 0="Control" 1="Intervention";
run;

axis1 label = (angle = 90) minor = none order = 0 to 1.1 by .1;
symbol1 i = stepj l = 3 w = 5 c = red;
symbol2 i = stepj l = 1 w = 5 c = blue;
proc gplot data = naout1;
plot cumhaz * dayslink = treat / vaxis = axis1;
label treat = "Treatment group";
format treat treat. cumhaz dayslink 4.1;
run;
quit;