SAS and R: Example 7.29: Bubble plots colored by a fourth variable

Saturday, March 27, 2010

Example 7.29: Bubble plots colored by a fourth variable

In Example 7.28, we generated a bubble plot showing the relationship among CESD, age, and number of drinks, for women. An anonymous commenter asked whether it would be possible to color the circles according to gender. In the comments, we showed simple code for this in R and hinted at a SAS solution for two colors. Here we show in detail what the SAS code would look like, and revisit the R code.

SAS

For SAS, we have to make two separate variables-- one with the CESD for the females, and another for the males. For the other gender, these gender-specific variables will have missing values. We'll do this using conditioning (section 1.11.2).


libname k "c:\book";

data twocolors;
set k.help;
  if female eq 1 then femalecesd = cesd;
  else malecesd = cesd;
run;

Now we can use the bubble2 statement (close kin of the plot2 statement, section 5.1.2) to add both gender-specific variables to the plot. While we're at it, we relabel the x-axis to no longer be gender specific and specify that the right y-axis is not to be labeled.


proc gplot data = twocolors;
bubble malecesd*age=i1 / bscale = radius bsize=200 
                           bcolor = blue bfill = solid;
bubble2 femalecesd*age=i1 / bscale = radius bsize = 200 
                           bcolor = pink bfill = solid noaxis;
label malecesd="CESD";
run;

As in the previous bubble plot example, the scale is manipulated arbitrarily so that the SAS and R figures are similar.

We're somewhat fortunate here that the range of the two gendered CESD scores are similar

R

In the comments for Example 7.28, we suggested the following simple R code.


load(url("http://www.math.smith.edu/sasr/datasets/savedfile"))
femalealc = subset(ds, female==1 & substance=="alcohol")
malealc = subset(ds, female==0 & substance=="alcohol")
with(malealc, symbols(age, cesd, circles=i1,
inches=1/5, bg="blue"))
with(femalealc, symbols(age, cesd, circles=i1,
inches=1/5, bg="pink", add=TRUE))

While this does generate a plot, it could be misleading, in that the scale of the circle sizes is relative to the largest value within each symbols() call. While this could be desirable, it's more likely that we'd like a single scale for the circles. R code for this can be made in a single statement:


load(url("http://www.math.smith.edu/sasr/datasets/savedfile"))
attach(ds)
symbols(age, cesd, circles=i1,inches=1/5, 
             bg=ifelse(female==1,"pink","blue"))

Here the ifelse() function (section 1.11.2) generates a different circle fill color depending on the value of female.

The resulting plots are shown below.

6 comments:

Anonymous said...: How could one extend the example to coloring by a fourth variable with more than two options? Is it also possible to combine it with adding bubble labels by a 5th variable?; July 15, 2012 at 3:28 PM
Anonymous said...: Agree with previous comment, the fourth variable being limited to a cardinality of 2 in sas is hardly useful.; September 16, 2012 at 5:50 PM
Ken Kleinman said...: Please see example 8.5 (http://sas-and-r.blogspot.com/2010/09/example-85-bubble-plots-part-3.html) to see this done, folks. The sgplot prot procedure also does it trivially:

data test;
do i = 1 to 40;
cat = ceil(i/10);
x = normal(0) - cat;
y = x + normal(0);
size = normal(0);
output;
end;
run;

proc sgplot data = test;
bubble x=x y=y size=size / group=cat;
run;
quit;; September 17, 2012 at 8:00 PM
Justin S. A. Perry said...: This is great, thanks. Is there a way to restrict the Z value to limit outliers? All of my points are "significant" but even after log transforming I still have one or two points that are much larger than the others, dwarfing the majority of bubbles.

Thanks!; December 7, 2016 at 5:31 PM
Ken Kleinman said...: Hi Justin--

My first thought would be to handle this on a case-by-case basis, meaning to arbitrarily remove the large values by hand before plotting the data.

But it would be an interesting exercise to construct a function to detect range issues like this. You could also embed the R code in a function and include an option to trim the n largest values before plotting.; December 8, 2016 at 9:34 AM
Justin S. A. Perry said...: The latter was a great suggestion, I was actually able to embed it into a DESeq2 analysis co-opting the way that heatmaps are handle outlier issues and applying it to this. Thanks again.; December 8, 2016 at 10:00 PM

Post a Comment

Reviews (from the first edition)

"By placing the R and SAS solutions together and by covering a vast array of tasks in one book, Kleinman and Horton have added surprising value and searchability to the information in their book. … a home run, and it is a book I am grateful to have sitting, dust-free, on my shelf."
—Robert Alan Greevy, Jr, Teaching of Statistics in the Health Sciences

"I use SAS and R on a daily basis. Each has strengths and weaknesses, and using both of them gives the advantage of being able to do almost anything when it comes to data manipulation, analysis, and graphics. If you use both SAS and R on a regular basis, get this book. If you know one of the packages and are learning the other, you may need more than this book, but get this book, too. "

Charles Heckler, University of Rochester, Technometrics

"Excellent cross-referencing to other topics and end-of-chapter worked examples on the ‘Health evaluation and linkage to primary care’ data set are given with each topic. … users who are proficient in either of the software packages but with the need to use the other will find this book useful."
—Frances Denny, Journal of the Royal Statistical Society, Series A

About the authors

Nicholas Horton is a Professor of Statistics at Amherst College. He is a biostatistician with expertise in missing data methods, longitudinal regression, statistical computing and statistical education. Nick's home page; Nick's Google Scholar author page

Ken Kleinman is an Associate Professor with the Department of Biostatistics and Epidemiology at the University of Massachusetts, Amherst. He is a consulting biostatistician with expertise in group-randomized trials and disease surveillance; he also offers R training courses. Ken's home page; Ken's Google Scholar author page.

SAS and R

Catalogs of posts

Saturday, March 27, 2010

Example 7.29: Bubble plots colored by a fourth variable

6 comments:

About SAS and R

Topics discussed