**SAS**

The SAS code is particularly tortured, since we must parse the parameter string to extract the table probabilities. We do this with the

`count`and

`find`(section 1.4.6) and

`substr`(section 1.4.3) functions. We also turn the characters into numbers (section 1.4.2). The SAS Macro (section A.8.1) runs a data step, so it cannot be called from within another data step, as would usually be desirable. Instead, you would have to generate the data and then merge it into any additional data that was required. On the whole, it might make more sense to treat every application as a special case and hand-code each one, rather than to use the macro, were a built-in function not available.

%macro rtable(outdata=, outvar=, n=1, p=);

data &outdata (keep=&outvar);

/* In the first section of code, we determine the number

of categories and set up an array to hold the

probabilities for each category. A maximum number of

catagories must be provided; here we use 100. */

nprobs = count(&p,",");

lengthprobs = length(&p);

array probs [100] _temporary_;

/* In the next section we parse the parameter string to

extract the probabilities for each category. */

lastcomma = 0;

lastprob = 0;

do prob = 1 to nprobs;

nextcomma = find(&p,",",lastcomma +1);

probs[prob] = lastprob +

substr(&p,lastcomma+1, (nextcomma-lastcomma -1));

lastcomma = nextcomma;

lastprob = probs[prob];

end;

/* Finally, we generate a uniform random variate and

use it to find the appropriate category. */

do i = 1 to &n;

__x = uniform(0);

&outvar = 1;

do j = 1 to nprobs;

&outvar = &outvar + (__x gt probs[j]);

end;

output;

end;

run;

%mend rtable;

We've left this code so that an extra comma must be included in the probability parameter string. This could be fixed with an additional line or two in the data step.

We can then run the macro and test the resulting data using

`proc freq`.

%rtable(outdata=test2,outvar= mycat, n = 10000,

p = ".1,.2,.3,");

proc freq data = test2; tables mycat; run;

Cumulative Cumulative

mycat Frequency Percent Frequency Percent

------------------------------------------------------

1 1032 10.32 1032 10.32

2 2013 20.13 3045 30.45

3 2956 29.56 6001 60.01

4 3999 39.99 10000 100.00

**R**

In contrast, the R code to write a function (section B.5.2) is trivial, using the

`cut()`function as suggested by Douglas Rivers, discussed in section 1.4.10.

rtable1 <- function(cuts,n) {

output <- cut(runif(n), cuts, labels=FALSE)

return(output)

}

In fact, the function hardly seems worth writing. To use it, you need to include 0 and 1 as the first and last elements of the

`cuts`vector. Results can be examined using the

`table()`function (section 2.2.1)

> mycat1 <- rtable1(c(0,.1,.3,.6,1),10000)

> summary(factor(mycat1))

1 2 3 4

1058 1989 2978 3975

To add a little bit of programming to it, suppose we want to list the probabilities per category, rather than the cutpoints. We'll omit the final category probability, assuming the input vector sums to less than 1 and has 1 fewer elements than the desired number of categories.

rtable2 <- function(p, n) {

cuts <- numeric(length(p) + 2)

cuts[1] <- 0

cuts[length(p) + 2] <- 1

for (k in 1:length(p)) {

cuts[k+1] <- sum(p[1:k])

}

output <- cut(runif(n), cuts, labels=FALSE)

return(output)

}

The results are similar to the simpler function. We can see the results tabulated without explicitly making the result a factor by using the

`table()`function (section 2.2).

> mycat2 <- rtable2(c(.1, .2, .3), 10000)

> table(mycat2)

mycat2

1 2 3 4

961 1986 3058 3995

## 1 comment:

... and proving the point that there are always many ways to skin a cat, XS, commenting on example 7.20, notes that the sample() function may be another more direct way to do this, meeting the goal of the more complex R function we show above.

Post a Comment