Monday, April 2, 2012

Example 9.25: It's been a mighty warm winter? (Plot on a circular axis)



Updated (see below)


People here in the northeast US consider this to have been an unusually warm winter. Was it?

The University of Dayton and the US Environmental Protection Agency maintain an archive of daily average temperatures that's reasonably current. In the case of Albany, NY (the most similar of their records to our homes in the Massachusetts' Pioneer Valley), the data set as of this writing includes daily records from 1995 through March 12, 2012.

In this entry, we show how to use R to plot these temperatures on a circular axis, that is, where January first follows December 31st. We'll color the current winter differently to see how it compares. We're not aware of a tool to enable this in SAS. It would most likely require a bit of algebra and manual plotting to make it work.

R
The work of plotting is done by the radian.plot() function in the plotrix package. But there are a number of data management tasks to be employed first. Most notably, we need to calculate the relative portion of the year that's elapsed through each day. This is trickier than it might be, because of leap years. We'll read the data directly via URL, which we demonstrate in Example 8.31. That way, when the unseasonably warm weather of last week is posted, we can update the plot with trivial ease.

library(plotrix)
temp1 = read.table("http://academic.udayton.edu/kissock/http/
Weather/gsod95-current/NYALBANY.txt")
leap = c(0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1)
days = rep(365, 18) + leap
monthdays = c(31,28,31,30,31,30,31,31,30,31,30,31)
temp1$V3 = temp1$V3 - 1994

The leap, days, and monthdays vectors identify leap years, count the corrrect number of days in each year, and have the number of days in the month in non-leap years, respectively. We need each of these to get the elapsed time in the year for each day. The columns in the data set are the month, day, year, and average temperature (in Fahrenheit). The years are renumbered, since we'll use them as indexes later.

The yearpart() function, below, counts the proportion of days elapsed.

yearpart = function(daytvec,yeardays,mdays=monthdays){
part = (sum(mdays[1:(daytvec[1]-1)],
(daytvec[1] > 2) * (yeardays[daytvec[3]]==366))
+ daytvec[2] - ((daytvec[1] == 1)*31)) / yeardays[daytvec[3]]
return(part)
}

The daytvec argument to the function will be a row from the data set. The function works by first summing the days in the months that have passed (,sum(mdays[1:(daytvec[1]-1)]) adding one if it's February and a leap year ((daytvec[1] > 2) * (yeardays[daytvec[3]]==366))). Then the days passed so far in the current month are added. Finally, we subtract the length of January, if it's January. This is needed, because sum(1:0) = 1, the result of which is that that January is counted as a month that has "passed" when the sum() function quoted above is calculated for January days. Finally, we just divide by the number of days in the current year.

The rest is fairy simple. We calculate the radians as the portion of the year passed * 2 * pi, using the apply() function to repeat across the rows of the data set. Then we make matrices with time before and time since this winter started, admittedly with some ugly logical expressions (section 1.14.11), and use the radian.plot() function to make the plots. The options to the function are fairly self-explanatory.

temp2 = as.matrix(temp1)
radians = 2* pi * apply(temp2,1,yearpart,days,monthdays)

t3old = matrix(c(temp1$V4[temp1$V4 != -99 & ((temp1$V3 < 18) | (temp1$V2 < 12))],
radians[temp1$V4 != -99 & ((temp1$V3 < 18) | (temp1$V2 < 2))]),ncol=2)

t3now= matrix(c(temp1$V4[temp1$V4 != -99 &
((temp1$V3 == 18) | (temp1$V3 == 17 & temp1$V1 == 12))],
radians[temp1$V4 != -99 & ((temp1$V3 == 18) |
(temp1$V3 == 17 & temp1$V1 == 12))]),ncol=2)
# from plottrix library
radial.plot(t3old[,1],t3old[,2],rp.type="s", point.col = 2, point.symbols=46,
clockwise=TRUE, start = pi/2, label.pos = (1:12)/6 * (pi),
labels=c("February 1","March 1","April 1","May 1","June 1",
"July 1","August 1","September 1","October 1","November 1",
"December 1","January 1"), radial.lim=c(-20,10,40,70,100))

radial.plot(t3now[,1],t3now[,2],rp.type="s", point.col = 1, point.symbols='*',
clockwise=TRUE, start = pi/2, add=TRUE, radial.lim=c(-20,10,40,70,100))

The result is shown at the top. The dots (point.symbol is like pch so 20 is a point (section 5.2.2) show the older data, while the asterisks are the current winter. An alternate plot can be created with the rp.type="p" option, which makes a line plot. The result is shown below, but the lines connecting the dots get most of the ink and are not what we care about today.

Either plot demonstrates clearly that a typical average temperature in Albany is about 60 to 80 in August and about 10 to 35 in January, the coldest monthttp://www.blogger.com/img/blank.gifh.

Update
The top figure shows that it has in fact been quite a warm winter-- most of the black asterisks are near the outside of the range of red dots. Updating with more recent weeks will likely increase this impression. In the first edition of this post, the radial.lim option was omitted, which resulted in different axes in the original and "add" calls to radial.plot. This made the winter look much cooler. Many thanks to Robert Allison for noticing the problem in the main plot. Robert has made many hundreds of beautiful graphics in SAS, which can be found here. He also has a book. Robert also created a version of the plot above in SAS, which you can find here, with code here. Both SAS and R (not to mention a host of other environments) are sufficiently general and flexible that you can do whatever you want to do-- but varying amounts of expertise might be required.

An unrelated note about aggregators
We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers and PROC-X with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.

10 comments:

Rick Wicklin said...

I suppose by "a tool in SAS" you mean a built-in function to call. No, there isn't one, but it wouldn't be hard to do the data prep in SAS/IML and then output a data set suitable for PROC SGPLOT. The polar transformation is easy; the axis/labels require slightly more work.

A more relevant question is whether there is an alternative ways to visualize these data that might make it easier to compare months and visualize seasonal trends. I think I'd use a (folded) scatter plot with a periodic smoother, similar to Fig 7 in my recent JCGS article. Go to http://amstat.tandfonline.com/doi/abs/10.1198/jcgs.2011.2de and click the "Supplemental" tab to see the PDF with Fig 7.

Ken Kleinman said...

Rick, thanks for that. I agree it's mainly the labels and axes that would be a hassle.

It wasn't my purpose to compare months or seasonal effects, but I'm sure others would be curious to see your plot. Can you post a direct link to a copy, or a similar image? If I follow the link, I see a "Supplementary" tab but can't get through it without buying access. For myself, I have access, but I'd guess other readers might lack it.

Rick Wicklin said...

Sure. There's an almost identical version on the SAS Web site:
http://support.sas.com/rnd/app/papers/airlinedelays.pdf

Also, a colleague of mine asked me to post the following links to some graphs that he has created using SAS/GRAPH. To see the corresponding SAS file, hack off the .htm extension and put .sas. I don't use SAS/GRAPH myself, but I know you do, so you might find the SAS code as interesting as the graphs themselves. I like the Antartica one!

http://sww.sas.com/~realliso/democd17/wind.htm
http://sww.sas.com/~realliso/democd21/antarc.htm
http://sww.sas.com/~realliso/democd4/lead.htm

Ken Kleinman said...

Thanks-- I'm a fan of Rob Allison's work and have spent many hours paging through his stuff. The links above didn't work for me, but oy can find them at http://robslink.com/SAS/ with the same extensions listed in Rick's comment.

Wayne said...

You can also do this with R's ggplot2 graphics:


library (ggplot2)
library (lubridate)
library (scales)

temp1 <- read.table ("http://academic.udayton.edu/kissock/http/Weather/gsod95-current/NYALBANY.txt")

temp <- temp1
names (temp) <- c("month", "day", "year", "temp")

temp$this.year <- temp$year == 2012
temp[temp$temp < 0, "temp"] <- 0
temp$date <- ISOdate (temp$year, temp$month, temp$day)


ggplot (data=temp, aes (x=date, y=temp)) + geom_point (aes (x=yday (date), color=this.year)) + scale_x_date (labels=date_format ("%b"), breaks=date_breaks ("month")) + coord_polar ()

which gives a similar graph without all the calculations. You can drop the coord_polar to see it with rectangular coordinates.

Robert Allison said...

There are several ways to create a plot similar to this in SAS. In order to create a plot almost exactly the same, I have chosen to work out the geometry with equations, and ‘draw’ the chart with our annotate ‘pie’ and text ‘label’ functions. Here is my SAS graph version, and the code I used to create it:

http://robslink.com/SAS/democd55/albany_ny_circular.htm
http://robslink.com/SAS/democd55/albany_ny_circular.sas

I have one question about the original R graph … I notice several of the ‘current winter’ * markers are showing up below the zero degree circle, but when I look at the actual temperature values in the text file (http://academic.udayton.edu/kissock/http/Weather/gsod95-current/NYALBANY.txt ) there have been no sub-zero temperatures in Albany NY this winter (01dec2011-present). The red dots look ~correct in the R graph, but all the *’s seem to be shifted in to a lower temperature than they should be (which makes the current winter look colder than it really was). Perhaps the red dots and the *’s have each been scaled separately (one based on the center of the circle being zero, and the other with the center being some negative value below zero, and then both set of points plotted onto the same axes)?? I think this bears looking into, to determine the underlying cause.

Rick Wicklin said...

Argh! I posted the internal SAS version by mistake!
http://robslink.com/SAS/democd17/wind.htm
http://robslink.com/SAS/democd21/antarc.htm
http://robslink.com/SAS/democd4/lead.htm

Ken Kleinman said...

Your image is very nice, Rob, and those black dots look a lot hotter.

And good on you for following up on the data. I saw the 0 value and figured there had been a cold day, without checking. Will follow up on this ASAP.

Ken Kleinman said...

I updated the plot to correct the error. Thanks again for your diligence, and your correct diagnosis of the problem. I'll be following up with the folks who maintain the plotrix package-- it would be nice if the add=TRUE calls could use the previous axis.

Ken Kleinman said...

Nice! Thanks, Wayne. I've still not made the time investment to learn ggplot2.