Showing posts with label 3D plots. Show all posts
Showing posts with label 3D plots. Show all posts

Saturday, March 27, 2010

Example 7.29: Bubble plots colored by a fourth variable

In Example 7.28, we generated a bubble plot showing the relationship among CESD, age, and number of drinks, for women. An anonymous commenter asked whether it would be possible to color the circles according to gender. In the comments, we showed simple code for this in R and hinted at a SAS solution for two colors. Here we show in detail what the SAS code would look like, and revisit the R code.


SAS

For SAS, we have to make two separate variables-- one with the CESD for the females, and another for the males. For the other gender, these gender-specific variables will have missing values. We'll do this using conditioning (section 1.11.2).


libname k "c:\book";

data twocolors;
set k.help;
if female eq 1 then femalecesd = cesd;
else malecesd = cesd;
run;


Now we can use the bubble2 statement (close kin of the plot2 statement, section 5.1.2) to add both gender-specific variables to the plot. While we're at it, we relabel the x-axis to no longer be gender specific and specify that the right y-axis is not to be labeled.


proc gplot data = twocolors;
bubble malecesd*age=i1 / bscale = radius bsize=200
bcolor = blue bfill = solid;
bubble2 femalecesd*age=i1 / bscale = radius bsize = 200
bcolor = pink bfill = solid noaxis;
label malecesd="CESD";
run;


As in the previous bubble plot example, the scale is manipulated arbitrarily so that the SAS and R figures are similar.

We're somewhat fortunate here that the range of the two gendered CESD scores are similar

R

In the comments for Example 7.28, we suggested the following simple R code.


load(url("http://www.math.smith.edu/sasr/datasets/savedfile"))
femalealc = subset(ds, female==1 & substance=="alcohol")
malealc = subset(ds, female==0 & substance=="alcohol")
with(malealc, symbols(age, cesd, circles=i1,
inches=1/5, bg="blue"))
with(femalealc, symbols(age, cesd, circles=i1,
inches=1/5, bg="pink", add=TRUE))


While this does generate a plot, it could be misleading, in that the scale of the circle sizes is relative to the largest value within each symbols() call. While this could be desirable, it's more likely that we'd like a single scale for the circles. R code for this can be made in a single statement:


load(url("http://www.math.smith.edu/sasr/datasets/savedfile"))
attach(ds)
symbols(age, cesd, circles=i1,inches=1/5,
bg=ifelse(female==1,"pink","blue"))


Here the ifelse() function (section 1.11.2) generates a different circle fill color depending on the value of female.

The resulting plots are shown below.


Monday, March 22, 2010

Example 7.28: Bubble plots

A bubble plot is a means of displaying 3 variables in a scatterplot. The z dimension is presented in the size of the plot symbol, typically a circle. The area or radius of the circle plotted is proportional to the value of the third variable. This can be a very effective data presentation method. For example, consider Andrew Gelman's recent re-presentation of health expenditure/survival data/annual number of doctor visits per person. On the other hand, Edward Tufte suggests that such representations are ambiguous, in that it is often unclear whether the area, radius, or height reflects the third variable. In addition, he reports that humans tend not to be good judges of relative area.

However, other means of presenting three dimensions on a flat screen or piece of paper often rely on visual cues regarding perspective, which some find difficult to judge.

Here we demonstrate SAS and R bubble plots using the HELP data set used in our book. We show a plot of depression by age, with bubble size proportional to the average number of drinks per day. To make the plot a little easier to read, we show this only for female alcohol abusers.

SAS

In SAS, we can use the bubble statement in proc gplot. We demonstrate here the use of the where data set option (section 1.5.1) for subsetting, which allows us to avoid using any data steps. SAS allows the circle area or radius to be proportional to the third variable; we choose the radius for compatibility with R. We alter the size of the circles for the same reason. We also demonstrate options for coloring in the filled circles.


libname k "c:\book";

proc gplot data = k.help (where=((female eq 1)
and (substance eq "alcohol")));
bubble cesd*age=i1 / bscale = radius bsize=60
bcolor=blue bfill=solid;
run;



R

In R, we can use the symbols() function for the plot. Here we also demonstrate reading in data previously saved in native R format (section 1.1.1), as well as the subset() function and the with() function (the latter appears in section 1.3.1). The inches option is an arbitrary scale factor. We note that the symbols() function has a great deal of additional capability-- it can substitute squares for circles for plotting the third variable, and add additional dimensions with rectangles or stars. Proportions can be displayed with thermometers, and boxplots can also be displayed.


load(url("http://www.math.smith.edu/sasr/datasets/savedfile"))
femalealc = subset(ds, female==1 & substance=="alcohol")
with(femalealc, symbols(age, cesd, circles=i1,
inches=1/5, bg="blue"))


The results are shown below. It appears that younger women with more depressive symptoms tend to report more drinking.