Showing posts with label missing data. Show all posts
Showing posts with label missing data. Show all posts

Monday, December 14, 2009

Example 7.18: Displaying missing value categories in a table

When displaying contingency tables (section 2.3.1), there are times when it is useful to either show or hide the missing data category. Both SAS and the typical R command default to displaying the table only for observations where both factors are observed.

In this example, we generate some multinomial data (section 1.10.4) and then produce tables with and without missing data categories.

SAS

Generate the multinomial data, uniform data, and use the latter to censor the former:

data blog;
do i = 1 to 300;
x = rand("TABLE",.3,.4);
y = rand("TABLE",.3,.4);
if uniform(0) gt .8 then x = .;
if uniform(0) gt .8 then y = .;
output;
end;
run;


Print the default table with only complete data. Note the options used to reduce output, as in section 4.6.9.

proc freq data=blog;
tables y*x / norow nocol;
run;


This produces:

Table of y by x

y x

Frequency|
Percent | 1| 2| 3| Total
---------+--------+--------+--------+
1 | 16 | 13 | 18 | 47
| 8.16 | 6.63 | 9.18 | 23.98
---------+--------+--------+--------+
2 | 18 | 32 | 22 | 72
| 9.18 | 16.33 | 11.22 | 36.73
---------+--------+--------+--------+
3 | 28 | 31 | 18 | 77
| 14.29 | 15.82 | 9.18 | 39.29
---------+--------+--------+--------+
Total 62 76 58 196
31.63 38.78 29.59 100.00

Frequency Missing = 104


The missing categories are included through the missprint option.

proc freq data = blog;
tables y*x / norow nocol missprint;
run;


This produces:

Table of y by x

y x

Frequency|
Percent | .| 1| 2| 3| Total
---------+--------+--------+--------+--------+
. | 12 | 12 | 20 | 14 | .
| . | . | . | . | .
---------+--------+--------+--------+--------+
1 | 10 | 16 | 13 | 18 | 47
| . | 8.16 | 6.63 | 9.18 | 23.98
---------+--------+--------+--------+--------+
2 | 17 | 18 | 32 | 22 | 72
| . | 9.18 | 16.33 | 11.22 | 36.73
---------+--------+--------+--------+--------+
3 | 19 | 28 | 31 | 18 | 77
| . | 14.29 | 15.82 | 9.18 | 39.29
---------+--------+--------+--------+--------+
Total . 62 76 58 196
. 31.63 38.78 29.59 100.00
Frequency Missing = 104


Note that if there are no missing values, SAS will not print the rows and columns headed with a '.' which is analogous to the "ifany" option in R shown below.

R

First, generate the data:

library(Hmisc)
x <- rMultinom(matrix(c(.3,.3,.4),1,3),300)
y <- rMultinom(matrix(c(.3,.3,.4),1,3),300)


Then, generate some random Uniforms to censor some of the observed data:

censprobx <- runif(300)
censproby <- runif(300)


Censor the data:

x[censprobx > .8] <- NA
y[censproby > .8] <- NA


Produce the default table (omits any missing data):

table(y,x)

x
y 1 2 3
1 18 18 29
2 17 21 22
3 20 30 40


Make the table which includes the missing category:

table(y, x, useNA="ifany")

x
y 1 2 3 NA
1 18 18 29 9
2 17 21 22 17
3 20 30 40 17
NA 14 5 14 9


The useNA option also allows the values "no" and "always". The value "no" corresponds to the default behavior in R or SAS, while the "always" option is not available in SAS. SAS, however, shows the total number missing in any case.