Monday, April 12, 2010

Example 7.32: Add reference lines to a plot; fine control of tick marks

Sometimes it's useful to plot regular reference lines along with the data. For a time-series plot, this can show when critical values are reached in a clearer way than simple tick marks.

As an example, we revisit the empirical CDF plot shown in Example 7.11. If you missed that entry, the data can be downloaded so you can easily explore the code shown below. We'll show how to add a regular grid of lines and lines at specific x or y values.

In a departure from our usual style, we'll discuss SAS and R in parallel, not in sequence.

Original Plot

The plot shown in Example 7.11 was obtained by doing some calculation and then with the following code.


SAS

symbol1 i=j v=none c=blue;
proc gplot data=help_a;
plot ecdf_pcs * pcs;
run;


R

plot(sortpcs, ecdfpcs, type="n")
lines(sortpcs, ecdfpcs)


A simple default reference grid can be produced in SAS by changing the plot statement to read plot ecdf_pcs * pcs / grid;. In R, the grid() function (with no parameters specified) has the same effect. Either adds light grey lines at the major tick marks in both the x and y directions. The major tick marks themselves can be selected using the axis statement in SAS or with the axis() function in R, as discussed in section 5.3.7 and 5.3.8. Changing the major tick marks in SAS will make the grid appear at the modified tick locations.


axis1 order=(0 to 1 by .25) minor=none;
axis2 order=(10 to 80 by 35) minor=none;
symbol1 i=j v=none c=blue;
proc gplot data=help_a;
plot ecdf_pcs * pcs / vaxis=axis1 haxis=axis2 grid;
run;



Unfortunately, it is difficult to get the grid() function to match up with tick marks away from the defaults. A better approach in R is to use the abline() function (section 5.2.1). to draw the reference lines where you want them. In the following code, we also demonstrate customization of the tick marks to match the SAS output shown above.


plot(sortpcs, ecdfpcs, type="n", xaxt="n", yaxt="n", xlim=c(10,80))
axis(side=1, at=c(10,45,80))
axis(side=2, at=c(seq(0,1,.25)))
lines(sortpcs, ecdfpcs)
abline(v=c(10,45,80), col="lightgray", lty="dotted")
abline(h=c(seq(0,1,.25)), col ="lightgrey", lty="dotted")


In the above, the options to plot() suppress the data and axes and specify the range of the x axis. The axis() function calls specify where the tick marks should appear, and the abline() function calls add the reference lines.

SAS also allows manual specification of reference lines. A result equivalent to the demonstrated grid option for the plot statement could be obtained as follows.


axis1 order=(0 to 1 by .25) minor=none;
axis2 order=(10 to 80 by 35) minor=none;
symbol1 i=j v=none c=blue;
proc gplot data=help_a;
plot ecdf_pcs * pcs / vaxis=axis1 haxis=axis2 href=10,45,80
vref=0,.25,.5,.75,1 chref=lightgrey cvref=lightgrey;
run;
quit;


The added control offered by the abline() or ?ref approach is that reference lines can trivially be drawn at points not appearing at major tick marks.
For example, we might have a particular interest in the 90th percentile of the data. We can add abline(h=.9, col="lightgrey", lty="dotted") as a separate command in R, or add the new value to the list of vrefs in SAS to add this line. The final result is shown below.



(The image above modifies the abline() calls to drop the lty option and change the color to "grey". Otherwise the reference lines were too faint to display well here.)

No comments: