Monday, December 14, 2009

Example 7.18: Displaying missing value categories in a table

When displaying contingency tables (section 2.3.1), there are times when it is useful to either show or hide the missing data category. Both SAS and the typical R command default to displaying the table only for observations where both factors are observed.

In this example, we generate some multinomial data (section 1.10.4) and then produce tables with and without missing data categories.

SAS

Generate the multinomial data, uniform data, and use the latter to censor the former:

data blog;
do i = 1 to 300;
x = rand("TABLE",.3,.4);
y = rand("TABLE",.3,.4);
if uniform(0) gt .8 then x = .;
if uniform(0) gt .8 then y = .;
output;
end;
run;


Print the default table with only complete data. Note the options used to reduce output, as in section 4.6.9.

proc freq data=blog;
tables y*x / norow nocol;
run;


This produces:

Table of y by x

y x

Frequency|
Percent | 1| 2| 3| Total
---------+--------+--------+--------+
1 | 16 | 13 | 18 | 47
| 8.16 | 6.63 | 9.18 | 23.98
---------+--------+--------+--------+
2 | 18 | 32 | 22 | 72
| 9.18 | 16.33 | 11.22 | 36.73
---------+--------+--------+--------+
3 | 28 | 31 | 18 | 77
| 14.29 | 15.82 | 9.18 | 39.29
---------+--------+--------+--------+
Total 62 76 58 196
31.63 38.78 29.59 100.00

Frequency Missing = 104


The missing categories are included through the missprint option.

proc freq data = blog;
tables y*x / norow nocol missprint;
run;


This produces:

Table of y by x

y x

Frequency|
Percent | .| 1| 2| 3| Total
---------+--------+--------+--------+--------+
. | 12 | 12 | 20 | 14 | .
| . | . | . | . | .
---------+--------+--------+--------+--------+
1 | 10 | 16 | 13 | 18 | 47
| . | 8.16 | 6.63 | 9.18 | 23.98
---------+--------+--------+--------+--------+
2 | 17 | 18 | 32 | 22 | 72
| . | 9.18 | 16.33 | 11.22 | 36.73
---------+--------+--------+--------+--------+
3 | 19 | 28 | 31 | 18 | 77
| . | 14.29 | 15.82 | 9.18 | 39.29
---------+--------+--------+--------+--------+
Total . 62 76 58 196
. 31.63 38.78 29.59 100.00
Frequency Missing = 104


Note that if there are no missing values, SAS will not print the rows and columns headed with a '.' which is analogous to the "ifany" option in R shown below.

R

First, generate the data:

library(Hmisc)
x <- rMultinom(matrix(c(.3,.3,.4),1,3),300)
y <- rMultinom(matrix(c(.3,.3,.4),1,3),300)


Then, generate some random Uniforms to censor some of the observed data:

censprobx <- runif(300)
censproby <- runif(300)


Censor the data:

x[censprobx > .8] <- NA
y[censproby > .8] <- NA


Produce the default table (omits any missing data):

table(y,x)

x
y 1 2 3
1 18 18 29
2 17 21 22
3 20 30 40


Make the table which includes the missing category:

table(y, x, useNA="ifany")

x
y 1 2 3 NA
1 18 18 29 9
2 17 21 22 17
3 20 30 40 17
NA 14 5 14 9


The useNA option also allows the values "no" and "always". The value "no" corresponds to the default behavior in R or SAS, while the "always" option is not available in SAS. SAS, however, shows the total number missing in any case.

4 comments:

Anonymous said...

If one wants to include missing categories (NOT missing value in the original data NOR the freq of missing values) in an output dataset, then how does one od that. I don't want to use the weight option, because for that I will have to add some dummy values to my data (or else create another dataset).

Ken Kleinman said...

I don't think I understand the question. Can you provide an example of what the data look like and how you want the table to look?

Anonymous said...

I would like to display all "Grade"s in the following example even when data is missing. How could I accomplish thsi? Printmiss in my proc freq doesn't seem to work:

/** GRADE 7**/
DATA MATH7;
LENGTH Grade $30;
SET star;
IF recordtype = '01' and CST_CMA_Math_Rpt='1' and CST_Math_Incl IN ("Y", "R") and TestGrdLvl='07';
IF CST_Math_Incl= "R" THEN CST_MATH_PL ='1';
IF CST_MathTst IN (" ", "0")THEN Grade = "Grade 07"; IF CST_MathTst="3" THEN Grade = "Grade 07 Algebra 1"; IF CST_MathTst ="5" THEN Grade = "Grade 07 Geometry";
IF CST_MathTst ="7" THEN Grade = "Grade 07 Algebra 2"; IF CST_MathTst ="4" THEN Grade = "Grade 07 Int Math 1"; IF CST_MathTst ="6" THEN Grade = "Grade 07 Int Math 2";
KEEP StdNum Grade CST_MathTst CST_Math_PL CST_Math_SS ;
RUN;

Ken Kleinman said...

Really hard to tell, without the data and an idea of what you want to get. if can post or send it, I'd be happy to look at it.