Today we call attention to a SAS macro that greatly eases integrating R from SAS. Published last month in the Journal of Statistical Software, the macro (written by Xin Wei of Roche Pharmaceuticals) is called Proc_R, and we discuss its installation and use today. For a fuller write-up, see the paper, here. For SAS users, the macro is a huge productivity booster, allowing one to easily complete data management and/or partial data analysis in SAS, skip out quickly to R for analyses that are awkward or impossible in SAS, then return to SAS for completion. For people in industry, this may also ease integrating R into documentation systems built for SAS code. See this post on DecisionStats for a review of other integration attempts.
1. Download the "SAS source code" and the "Replication code and instructions".
2. Move the macro somewhere you have write access.
3. Open the macro in a text editor and change line 46 so that the rpath option points to the location of your R executable.
(4. If you're running Windows 7 or Vista, and you has SAS 9.1 or above, follow instructions in a PDF in the second supplemental file you downloaded. This makes a shortcut for a special version of SAS. I'm not at all sure why you have to do this, though. I had the same results running in my usual SAS set-up.)
That's it! The way the macro works is to read in your R code as a SAS data set, write it out to a file, and call R to run it, then do a bunch of post-processing. The basic macro call looks like this:
%Proc_R (SAS2R =, R2SAS =);
***Please Enter R Code Here***
You just replace the starred lines with R code, and run-- the R results, if any, appear in your SAS output and/or results windows. The SAS2R value is a list of the names of SAS data sets you'd like to send to R; they're added into the R environment before your code is executed. The R2SAS value is a list of the names of R objects (that can be coerced to data frames) that you'd like to become SAS data sets.
Here's a trivial example-- generate two data sets in SAS, send them to R to run linear regressions, and send the resulting parameter estimates back to SAS.
do i = 1 to 1000;
x = normal(0);
y = x + normal(0);
do i = 1 to 100;
x = normal(0);
y = x + uniform(0);
%Proc_R (SAS2R =test t2, R2SAS =mylm mylm2);
an.lm = with(test,lm(y ~x))
mylm = t(coef(an.lm))
an.lm2 = with(t2,lm(y~x))
mylm2 = t(coef(an.lm2))
proc print data = mylm; run;
proc print data = mylm2; run;
And here's what you get in the SAS log.
[First, proc_r result]
> test<- read.csv('c:/temp/test.csv')
> t2<- read.csv('c/temp/t2.csv')
> an.lm = with(test,lm(y ~x))
> mylm = t(coef(an.lm))
lm(formula = y ~ x)
Min 1Q Median 3Q Max
-2.8571 -0.6430 -0.0051 0.6713 3.5903
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.008568 0.031686 0.27 0.787
x 1.020640 0.033315 30.64 <2e-16 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.002 on 998 degrees of freedom
Multiple R-squared: 0.4846, Adjusted R-squared: 0.4841
F-statistic: 938.5 on 1 and 998 DF, p-value: < 2.2e-16
> an.lm2 = with(t2,lm(y~x))
> mylm2 = t(coef(an.lm2))
user system elapsed
0.28 0.10 0.37
[Here are the proc print results]
Obs _Intercept_ x
1 0.0085676126 1.0206400545
Obs _Intercept_ x
1 0.528410053 0.9851225238
(Page breaks and some extraneous stuff removed.)
It's pretty magical for a SAS user to see R living in the SAS output like this. But there are some caveats. First, this is a windows-only macro. If you run SAS on *nix, you may not be able to get it to work.
(UPDATE: Reader Abhijit suggested a setwd() in the R code as a fix for the graphics problem. This works, and I now get R grapics in my SAS results viewer. Even more magical. Code and output above updated to show this. Thanks, Abhijit!)
However, these seem like minor problems, compared with the overall simplification offered by the macro. It's been of great use to me in the past few months, and I expect it will help others as well. Many thanks and congratulations to Xin Wei!