**R**

Here we demonstrate this by displaying the means of the

`cesd1`,

`cesd2`,

`cesd3`, and

`cesd4`variables measuring depressive symptoms at each of the followup time points for the HELP study.

ds = read.csv("http://www.math.smith.edu/r/data/help.csv")

mean(ds[, paste('cesd', seq(1:4), sep = '')], na.rm=TRUE)

which generates the output:

cesd1 cesd2 cesd3 cesd4

22.71545 23.58373 22.06855 20.14286

This approach selects a set of variables by generating a character vector of variable names using the

`paste()`function (section 1.4.5) and the

`seq()`function (section 1.11.3). Then the

`mean()`function is applied to the selected columns.

**SAS**

This task is straightforward in SAS, using the

`-`syntax (section 1.11.4) in the

`var`statement in

`proc means`.

proc means data=ds maxdec=2 n mean;

var cesd1 - cesd4;

run;

The MEANS Procedure

Variable Label N Mean

-----------------------------------------

CESD1 1 cesd 246 22.72

CESD2 2 cesd 209 23.58

CESD3 3 cesd 248 22.07

CESD4 4 cesd 266 20.14

-----------------------------------------

## 4 comments:

For variables stored in adjacent columns you can use

ds[,names(ds)[grep("var_x",names(ds)):grep("var_y",names(ds))]

wojteksobala:

For the adjacent columns it's simpler:

mean(subset(ds, select=cesd1:cesd4), na.rm=TRUE)

but the problem doesn't concern it.

I was thinking for the adjacent columns you could just use the column numbers, e.g., ds[, 4:8], but I prefer wojteksobala's approach, which doesn't require you to learn the numeric location of the columns. We'll be using grep() inside the index in next week's entry.

You also don't need `seq(1:4)`, `1:4` is sufficient, or `seq(1, 4)`, but not both.

Post a Comment