**SAS**

The

`where`statement (section A.6.3) is a powerful and useful tool for subsetting on the fly. (Other options for subsetting typically require data steps or data set options in the

`proc`statement.) But one problem with subsetting in SAS is that the output doesn't show the conditioning. This is a little macro to correct this problem:

%macro where(wherest, titleline);

where &wherest;

title&titleline "where %str(&wherest)";

%mend where;

If you use

`%where(condition, n);`in place of

`where condition;`the condition is both implemented and will appear in the output as the nth title line. This takes advantage of the fact that the title statements just need to appear before the procedure is run, as opposed to before the

`proc`statement.

For example:

title "HELP data set";

proc means data="c:\book\help.sas7bdat";

%where (female eq 1,2);

var age;

run;

Produces the output:

HELP data set

where female eq 1

The MEANS Procedure

Analysis Variable : AGE

Mean

------------

36.2523364

------------

Note that you need to use the testing forms such as

`eq`and not the assignment

`=`in your

`where`statements for this to work.

This tip is cross-posted on my list of SAS tricks.

**R**

Within R, many functions (e.g.

`lm()`) provide support for a

`subset`option (as in example 3.7.5). Indexing can be used to restrict analyses (B.4.2). In addition, the

`subset()`function can be embedded in a function call to make explicit what subset of observations will be included in the analysis. We demonstrate calculating the mean of female subjects all three ways below.

> ds = read.csv("http://www.math.smith.edu/r/data/help.csv")

> coef(lm(age ~ 1, subset=female==1, data=ds))

(Intercept)

36.25234

> mean(ds$age[ds$female==1])

[1] 36.25234

> mean(subset(ds,female==1,age))

age

36.25234

In interactive R the code and output lie together and it is usually trivial to trace the subsetting used to produce output. But in literate programming settings, or when output may be cut-and-pasted from R to a document, it may be valuable to mirror the SAS macro presented above.

Here is a function to display the subsetting condition for simple functions. It requires some sophistication to turn expressions and variables into objects (and vice-versa). The use of

`deparse(substitute())`allows us to create a character variable with the name of the subset condition and function, while

`eval()`is used to pass the name of the variable to the

`subset()`function. A similar functionality for more complex functions (such as those requiring a formula, would require some tweaking; a generic explicit subsetting function could be difficult.

verbose.subset = function(ds, condition, variable, FUN=mean) {

cat(paste("subsetting with condition: ",

deparse(substitute(condition)),

"\n", sep=""))

cat(paste("calling the function: ",

deparse(substitute(FUN)),

"\n", sep=""))

FUN(subset(ds, condition, select=eval(variable)))

}

This yields the following output:

> verbose.subset(ds, female==1, "age")

subsetting with condition: female == 1

calling the function: mean

age

36.25234

## No comments:

Post a Comment