SAS
For SAS, we have to make two separate variables-- one with the CESD for the females, and another for the males. For the other gender, these gender-specific variables will have missing values. We'll do this using conditioning (section 1.11.2).
libname k "c:\book";
data twocolors;
set k.help;
if female eq 1 then femalecesd = cesd;
else malecesd = cesd;
run;
Now we can use the bubble2 statement (close kin of the plot2 statement, section 5.1.2) to add both gender-specific variables to the plot. While we're at it, we relabel the x-axis to no longer be gender specific and specify that the right y-axis is not to be labeled.
proc gplot data = twocolors;
bubble malecesd*age=i1 / bscale = radius bsize=200
bcolor = blue bfill = solid;
bubble2 femalecesd*age=i1 / bscale = radius bsize = 200
bcolor = pink bfill = solid noaxis;
label malecesd="CESD";
run;
As in the previous bubble plot example, the scale is manipulated arbitrarily so that the SAS and R figures are similar.
We're somewhat fortunate here that the range of the two gendered CESD scores are similar
R
In the comments for Example 7.28, we suggested the following simple R code.
load(url("http://www.math.smith.edu/sasr/datasets/savedfile"))
femalealc = subset(ds, female==1 & substance=="alcohol")
malealc = subset(ds, female==0 & substance=="alcohol")
with(malealc, symbols(age, cesd, circles=i1,
inches=1/5, bg="blue"))
with(femalealc, symbols(age, cesd, circles=i1,
inches=1/5, bg="pink", add=TRUE))
While this does generate a plot, it could be misleading, in that the scale of the circle sizes is relative to the largest value within each symbols() call. While this could be desirable, it's more likely that we'd like a single scale for the circles. R code for this can be made in a single statement:
load(url("http://www.math.smith.edu/sasr/datasets/savedfile"))
attach(ds)
symbols(age, cesd, circles=i1,inches=1/5,
bg=ifelse(female==1,"pink","blue"))
Here the ifelse() function (section 1.11.2) generates a different circle fill color depending on the value of female.
The resulting plots are shown below.
6 comments:
How could one extend the example to coloring by a fourth variable with more than two options? Is it also possible to combine it with adding bubble labels by a 5th variable?
Agree with previous comment, the fourth variable being limited to a cardinality of 2 in sas is hardly useful.
Please see example 8.5 (http://sas-and-r.blogspot.com/2010/09/example-85-bubble-plots-part-3.html) to see this done, folks. The sgplot prot procedure also does it trivially:
data test;
do i = 1 to 40;
cat = ceil(i/10);
x = normal(0) - cat;
y = x + normal(0);
size = normal(0);
output;
end;
run;
proc sgplot data = test;
bubble x=x y=y size=size / group=cat;
run;
quit;
This is great, thanks. Is there a way to restrict the Z value to limit outliers? All of my points are "significant" but even after log transforming I still have one or two points that are much larger than the others, dwarfing the majority of bubbles.
Thanks!
Hi Justin--
My first thought would be to handle this on a case-by-case basis, meaning to arbitrarily remove the large values by hand before plotting the data.
But it would be an interesting exercise to construct a function to detect range issues like this. You could also embed the R code in a function and include an option to trim the n largest values before plotting.
The latter was a great suggestion, I was actually able to embed it into a DESeq2 analysis co-opting the way that heatmaps are handle outlier issues and applying it to this. Thanks again.
Post a Comment