Tuesday, September 14, 2010

Example 8.5: bubble plots part 3



An anonymous commenter expressed a desire to see how one might use SAS to draw a bubble plot with bubbles in three colors, corresponding to a fourth variable in the data set. (x, y, z for bubble size, and the category variable.) In a previous entries we discussed bubble plots and showed how to make the bubble print in two colors depending a fourth dichotomous variable.

The SAS approach to this cannot be extended to fourth variables with many values: we show here an approach to generating this output. The R version below represents a trivial extension of the code demonstrated earlier.

SAS

We'll start by making some data-- 20 observations in each of 3 categories.

data testbubbles;
do cat = 1 to 3;
do i = 1 to 20;
abscissa = normal(0);
ordinate = normal(0);
z = uniform(0);
output;
end;
end;
run;

Our approach will be to make an annotate data set using the annotate macros (section 5.2). The %slice macro easily draws filled circles. Check its documentation for full details on the parameters it needs in the on-line help: SAS Products; SAS/GRAPH; The Annotate Facility; Annotate Dictionary. Here we note that the 5th parameter is the radius of the circle, chosen here as an arbitrary function of z that makes pleasingly sized circles. Other parameters reflect color density, arc, and starting angle, which could be used to represent additional variables.

%annomac;
data annobub1;
set testbubbles;
%system(2,2,3);
%slice(abscissa, ordinate, 0, 360, sqrt(3*z), green, ps, 0);
run;

Unfortunately, due to a quirk of the macro facility, I don't think the color can be changed conditionally in the preceding step. Instead, we need a new data step to do this.

data annobub2;
set annobub1;
if cat=2 then color="red";
if cat=3 then color="blue";
run;

Now we're ready to plot. We use the symbol (section 5.2.2) statement to tell proc gplot not to plot the data, add the annotate data set, and suppress the legend, as the default legend will not look correct here. An appropriate legend could be generated with a legend statement.

symbol1 i=none r=3;
proc gplot data=testbubbles;
plot ordinate * abscissa = cat / annotate = annobub2 nolegend;
run;
quit;

The resulting plot is shown above. Improved axes are demonstrated throughout the book and in many previous blog posts.

R

The R approach merely requires passing three colors to the bg option in the symbols() function. To mimic SAS, we'll start by defining some data, then generate the vector of colors needed.

cat = rep(c(1, 2, 3), each=20)
abscissa = rnorm(60)
ordinate = rnorm(60)
z = runif(60)
plotcolor = ifelse(cat==1, "green", ifelse(cat==2, "red", "blue"))

The nested calls to the ifelse function (section 1.11.2) allow vectorized conditional tests with more than two possibilities. Another option would be to use a for loop (section 1.11.1) but this would be avoiding one of the strengths of R. In this example, I suppose I could have defined the cat vector with the color values as well, and saved some keystrokes.

With the data generated and the color vector prepared, we need only call the symbols() function.

symbols(ordinate, abscissa, circles=z, inches=1/5, bg=plotcolor)

The resulting plot is shown below.

9 comments:

Anonymous said...

I would rather use indexing to assign colors vector
plotcolor = c("green","red","blue")[cat]

Ken Kleinman said...

Nice!

Anonymous said...

Thx for this. I was trying to find ways to plot bubble plots in R and it was hard to find. Now I know :)

Ken Kleinman said...

Happy to be here, Anonymous. Check out the linked earlier entries or the documentation for symbols() to see a whole bunch of similar cool things to do.

Richard Thornton said...

Nice to see you guys are active again!

Ken Kleinman said...

The sgplot procedure can also do this easily:

data test;
do i = 1 to 40;
cat = ceil(i/10);
x = normal(0) - cat;
y = x + normal(0);
size = normal(0);
output;
end;
run;

proc sgplot data = test;
bubble x=x y=y size=size / group=cat;
run;
quit;

KM said...

Thanks for posting the sgplot proc, that is much easier! Do you know how to specify the different colors? There is the FILLATTRS color option, but I can't seem to figure out how to tell SAS to use more than one color.

Let's say in your example you want category1=red, category2=yellow, etc.

Anonymous said...

data attrmap;
retain id "myid";
length fillcolor $ 10;
input value $ fillcolor $;
datalines;
1 green
2 gray
3 blue
4 red
; run;
proc sgplot data = test dattrmap=attrmap;
bubble x=x y=y size=size / group=cat attrid=myid;
run;
quit;

Anonymous said...

I've been looking for an answer to the same question for quite a few hours. Thanks a lot, it really helped!