Monday, July 18, 2011

Example 9.3: augmented display of contingency table


SAS and R often provide different levels of details from output. This is particularly true for the descriptive analysis of contingency tables, where SAS makes it easy to display tables with additional quantities (such as the observed cell count).

The mosaic package has added functionality to calculate these quantities in R. We demonstrate using an example from the HELP dataset.

R

ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
library(mosaic)
ds$gender = ifelse(ds$female==1, "female", "male")
ds$homeless = ifelse(ds$homeless==1, "homeless", "housed")
tab = xtabs(~ gender + homeless, data=ds)
> tab
homeless
gender homeless housed
female 40 67
male 169 177
> xchisq.test(tab)

Pearson's Chi-squared test with Yates' continuity correction

data: tab
X-squared = 3.8708, df = 1, p-value = 0.04913

40.00 67.00
( 49.37) ( 57.63)
[1.78] [1.52]
<-1.33> < 1.23>

169.00 177.00
(159.63) (186.37)
[0.55] [0.47]
< 0.74> <-0.69>

key:
observed
(expected)
[contribution to X-squared]


We see that there is a borderline statistically significant association between gender and homeless status in the HELP study. We interpret that we see fewer than expected females who are homeless, and more males who are homeless.

Another idea is to use graphical depictions of the association in this table. One approach is a mosaic plot (note: no relation to Project MOSAIC and the mosaic package). A mosaic plot starts as a square with area equal to one. It is divided into columns based on the prevalence in each of the values for the column variable (in this case, gender). Then each bar is divided vertically based on the conditional probability of the other variable within that category.

Another graphical display of a table is the association plot. In an association plot, there is also a box for each cell of the table. The area of the box is proportional to the difference between the observed and expected (assuming no association) frequencies. In a typical presentation, excess observed counts are black and above the line, while deficient counts are red and below the line.

Above, we show the mosaic plot (on the left) and association plot (on the right). Both of these displays demonstrate that there is an association. The mosaic plot indicates that only about a quarter of the sample is female (indicated by the width of the columns), and that homelessness is present in about half the subjects (area shaded in light grey). The slight association demonstrated is that there are fewer homeless women than expected (since the horizontal line moves down between the first and second column). Similarly, for the association plot we note that the expected cell count is less than the observed (indicated in red with values below the line) for the female homeless group.

par(mfrow=c(1,2))
mosaicplot(tab, color=TRUE, main="mosaic plot")
assocplot(tab)
title("association plot")


SAS
As in Example 8.32, we find SAS macros for mosaic plots among the contributions of Michael Friendly. In this complex case, they are somewhat more difficult to access than others. The code for the plots themselves can be downloaded here, while it's useful to also run a wrapper macro. After downloading the files, the following code can be used to make the figure below.

title 'Install mosaic modules';
* location of the zipped files;
filename mosaic 'c:\ken\sasmacros\mosaics';
* storage location of compiled macros;
libname mosaic 'c:\ken\sasmacros\mosaics';

* Code to read in, compile and store the macros;
proc iml ;
reset storage=mosaic.mosaic;
%include mosaic(mosaics) ;
store module=_all_;
show storage;
quit;

* Prep: create the table, save the cell counts;
proc freq data = "c:\book\help.sas7bdat";
tables homeless * female / out=outhelp;
run;

* Read in the wrapper macro;
%include "c:\ken\sasmacros\mosaics\mosaic.sas";

* Make the plot;
%mosaic(data=outhelp,var = female homeless,
sort=homeless descending female, space = 1 1);

The sort and space options make the results more similar to those shown for mosaicplot(). In this version, the colors reflect the signs of the residuals.

1 comment:

Rick Wicklin said...

Since Michael Friendly's macros require the use of SAS/IML software, I'll mention that SAS/IML Studio, which is included for free with a SAS/IML license, also supports interactive mosaic plots. You can use them in a point-and-click manner or create them as part of a program. See http://support.sas.com/documentation/cdl/en/imlsug/64254/HTML/default/viewer.htm#imlsug_ugtwod_sect002.htm