Thursday, January 26, 2012

SAS Macro Simplifies SAS and R integration (Updated)

Many of us feel very enthusiastic about R. It's free, it features cutting edge applications, it has a large community of users contributing for mutual benefit, and on and on. There are also many things to like about SAS, including stability, backwards compatibility, and professional support among them. The way to be the best analyst you can be is to be flexible and have as many tools at your disposal as you can manage. That's the main motivating principle behind our book and what we do here in this blog.

Today we call attention to a SAS macro that greatly eases integrating R from SAS. Published last month in the Journal of Statistical Software, the macro (written by Xin Wei of Roche Pharmaceuticals) is called Proc_R, and we discuss its installation and use today. For a fuller write-up, see the paper, here. For SAS users, the macro is a huge productivity booster, allowing one to easily complete data management and/or partial data analysis in SAS, skip out quickly to R for analyses that are awkward or impossible in SAS, then return to SAS for completion. For people in industry, this may also ease integrating R into documentation systems built for SAS code. See this post on DecisionStats for a review of other integration attempts.

Getting ready

1. Download the "SAS source code" and the "Replication code and instructions".

2. Move the macro somewhere you have write access.

3. Open the macro in a text editor and change line 46 so that the rpath option points to the location of your R executable.

(4. If you're running Windows 7 or Vista, and you has SAS 9.1 or above, follow instructions in a PDF in the second supplemental file you downloaded. This makes a shortcut for a special version of SAS. I'm not at all sure why you have to do this, though. I had the same results running in my usual SAS set-up.)

That's it! The way the macro works is to read in your R code as a SAS data set, write it out to a file, and call R to run it, then do a bunch of post-processing. The basic macro call looks like this:

%include "C:\ken\sasmacros\Proc_R.sas";
%Proc_R (SAS2R =, R2SAS =);
Cards4;

******************************
***Please Enter R Code Here***
******************************

;;;;
%Quit;

You just replace the starred lines with R code, and run-- the R results, if any, appear in your SAS output and/or results windows. The SAS2R value is a list of the names of SAS data sets you'd like to send to R; they're added into the R environment before your code is executed. The R2SAS value is a list of the names of R objects (that can be coerced to data frames) that you'd like to become SAS data sets.

Use
Here's a trivial example-- generate two data sets in SAS, send them to R to run linear regressions, and send the resulting parameter estimates back to SAS.

data test;
do i = 1 to 1000;
x = normal(0);
y = x + normal(0);
output;
end;
run;

data t2;
do i = 1 to 100;
x = normal(0);
y = x + uniform(0);
output;
end;
run;

%include "C:\Proc_R.sas";
%Proc_R (SAS2R =test t2, R2SAS =mylm mylm2);
Cards4;
setwd("c:/temp")
an.lm = with(test,lm(y ~x))
mylm = t(coef(an.lm))

an.lm2 = with(t2,lm(y~x))
mylm2 = t(coef(an.lm2))
;;;;
%Quit;

proc print data = mylm; run;
proc print data = mylm2; run;

And here's what you get in the SAS log.

[First, proc_r result]

******************R OUTPUT***********************

R_OUTPUT_LOG

> setwd("c:/temp")
> library(grDevices)
> png("c:/temp/....png")
> test<- read.csv('c:/temp/test.csv')
> t2<- read.csv('c/temp/t2.csv')
> an.lm = with(test,lm(y ~x))
> mylm = t(coef(an.lm))
> summary(an.lm)

Call:
lm(formula = y ~ x)

Residuals:
Min 1Q Median 3Q Max
-2.8571 -0.6430 -0.0051 0.6713 3.5903

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.008568 0.031686 0.27 0.787
x 1.020640 0.033315 30.64 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.002 on 998 degrees of freedom
Multiple R-squared: 0.4846, Adjusted R-squared: 0.4841
F-statistic: 938.5 on 1 and 998 DF, p-value: < 2.2e-16

>
> an.lm2 = with(t2,lm(y~x))
> mylm2 = t(coef(an.lm2))
> write.csv(mylm,'mylm.csv',row.names=F)
> write.csv(mylm2,'mylm2.csv',row.names=F)
> dev.off()
null device
1
> q()
> proc.time()
user system elapsed
0.28 0.10 0.37


[Here are the proc print results]
Obs _Intercept_ x
1 0.0085676126 1.0206400545

Obs _Intercept_ x
1 0.528410053 0.9851225238

(Page breaks and some extraneous stuff removed.)

It's pretty magical for a SAS user to see R living in the SAS output like this. But there are some caveats. First, this is a windows-only macro. If you run SAS on *nix, you may not be able to get it to work. Second, while the article has examples of graphics from R neatly appearing in SAS, this failed for me. This may be due to the fact that I run SAS 9.3, while the author of the macro is still in earlier versions of SAS. I may try to diagnose and fix this problem, and will update this entry if I find a fix. (Fixed! See update below.)

(UPDATE: Reader Abhijit suggested a setwd() in the R code as a fix for the graphics problem. This works, and I now get R grapics in my SAS results viewer. Even more magical. Code and output above updated to show this. Thanks, Abhijit!)


However, these seem like minor problems, compared with the overall simplification offered by the macro. It's been of great use to me in the past few months, and I expect it will help others as well. Many thanks and congratulations to Xin Wei!

25 comments:

Abhijit said...

I was playing around with this macro this afternoon, and I initially had the same problem with graphics. The author is also aware of the problem, which seems to come from the full name of the location of the temp graphics file being too long. If I used a setwd() in the R code with a short path, it actually works fine. It's a small note in the paper, maybe the last page or so.

Ken Kleinman said...

Entry updated. Thanks for the fix!

Max said...

Ken, trying to replicate the example of the above post, I keep getting the following error:

WARNING: Apparent symbolic reference FGNAME not resolved.
$$$$&fgname
WARNING: Apparent symbolic reference FGSW not resolved.
ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric
operand is required. The condition was: &fgsw=1
ERROR: The macro QUIT will stop executing

I have R 12.1, SAS 9.2 on Windows XP.

I have no idea about what's going on.

Ken Kleinman said...

Hi, Max--

I don't really understand the internals of what the macro does. A quick peek at the code, though, suggests to me that the fgsw macro variable is meant to be a test of whether the R code ran at all. And I think the fgname variable might have to do with the working directory. So my first instinct would be to check the location of the R executable in line 46. And my second would be to check whether the setwd() directory exists on your computer.

Ken

Xin said...

thank Ken for your nice comments and Max for interest. I experienced the same error msg when I try this macro on SAS9.2 on WIN 7. The cause of this is malfunction of "pipe" of base SAS in this environment. I do have a workaround posted on the journal website, that is, to launch SAS from a modified desktop. I am not sure if you have the same problem as mine. But if you check your SAS log, if you can see the following error msg:

Stderr output:
There is not enough space on the disk.

then I would be very positive that my workaround should fix your issue.

Xin said...

plus this "FGNAME" thing only concerns the display of R figure in SAS result viewer. In another word, the statistical analysis and data exchange won't be impacted even with this error. You can always write your R figure to a user defined location rather than viewing them on fly.
Yet Max's problem has uniqueness. He is operating in WIN XP + SAS9.2. SAS folks told me that SAS9.2 under WIN XP is not covered by their tech support when I worked with them to develop solution to fix pipe function in SAS9.2+WIN 7.....But I will still suggest to use the modified desktop shortcut to address your problem...

Max said...

Ken + Xin,
thank you for your replies.
I realized that I have given you an inaccurate information, as I run SAS 9.1 (not 9.2 as mentioned).
Both the location of R.exe and the setwd() are OK. In fact, as Xin notes the R log is correctly shown in my SAS output window.
I've also tried the workaround but got the same results.

Naas said...

Things start to go wrong for me around:
NOTE: The infile PROC_R is:
Unnamed Pipe Access Device,

PROCESS=C:\Progra~1\R\R-2.13.0\bin\R.exe CMD BATCH --vanilla --quiet
"C:/Users/IGNATI~1/AppData/Local/Temp/SAS Temporary Files/_TD5188/r_code1643816702.r"
"c:/temp/r_log_1643816702.txt",
RECFM=V,LRECL=256

Stderr output:
The system cannot find the path specified.

Any ideas what the problem could be ?

Xin said...

If you run the following code,
%let saswork=%sysfunc(pathname(work));
%put &saswork;
what kind of path do you get?

Naas said...

Currently i get (this is after modifying startup as per the manual):
1 %let saswork=%sysfunc(pathname(work));
2 %put &saswork;
C:\Users\IGNATI~1\AppData\Local\Temp\SAS Temporary Files\_TD7716


If i change sas work to c:/temp i get:
C:\Temp\_TD8032

Anonymous said...

could you give a hello world example to test if it works ?

zepel said...

I have similar problem as max

I have R 2.11.1, SAS 9.2, EnterpriseGuide4.3 on Windows XP.

zepel said...

lol

This :

%Proc_R (SAS2R =, R2SAS =);
Cards4;
;;;;
%Quit

gives a different result than this :

%Proc_R (SAS2R =, R2SAS =);
Cards4;
;;;;
%Quit

... no real difference as you can see, only indentation.

1 : no errors. (probably not working though)
2 : a lot of errors and 6 tables created

Wasted a lot of time to detect this

zepel said...

sry to flood, but tabulations are not showing

the first code should have the 3 last lines indented :

%Proc_R (SAS2R =, R2SAS =);
Cards4;
;;;;
%Quit

Xin said...

hi, everyone:
you are welcome to send me email regarding your comments or questions. I would also like to help you test your R code in my setting in order to diagnose what the error might be. Please also make sure your R code works in a stand-alone R IDE before embedding it to SAS.
thank you all for the interest.

Ken Kleinman said...

Thanks, Xin, for helping people with installation and start-up. Folks, this is not child's play-- it's worth a little pain and time to get it up and running.

riskminds said...

Hi does anybody knows if this setup also works with a CITRIX client ?

Best rgds

Nils

Pierre said...

Hi everyone,

Maybe I misunderstood this sentences from Xin's paper :"The draft R script is interrogated and modifi ed by SAS macro so that it can be executed in SAS environment".
That means that man can use a global macro as "&myname." into the cards4 statement?
However I have problem to do it. R don't recognize '&' and stop the execution. Moreover, I read that call macro into a cards/datalines statement doesn't work.
Can I have some explanations please.

Best regards

Pierre

ps : I submit a similar code

***************************************;
%let sasds=test
%Proc_R (SAS2R=test,R2SAS=);
cards4;

library(lattice)
attach(&sasds.)
barchart(var1~var2,data=&sasds.)

;;;;
%quit;
***********************************;

Tabuchi said...

Thank you very much for proc_R.
When I run the example 1-4 (from the attachment file) by proc_R, I always got a WARNING because of the length of var1, using SAS9.2 and R2.14.2. Is correction for this available?

Ken Kleinman said...

I don't know if Xin is still following this post, but it would likely be helpful to give a little more detail. Where exactly does the error come up, and what does it say?

Anonymous said...

Has anyone tried to execute this code within a macro %DO loop? I'd like to run PROC_R for each level of a BY variable in my data set.

Ken Kleinman said...

I've not tried this, but I think it would be a lot more efficient to use a by-group equivalent in R, and just import/export the data sets once.

Martin said...

Hi Ken,

Thanks for a cool post. I wanted to know: In terms of ease of use, functionality, lack of bugs etc., how does Xin's macro compare to PROC IML; SUBMIT / R;

Xin's take is: Base SAS "lacks a well documented and user-friendly integration for R programming. A programming interface of base SAS and R is highly desirable because base SAS and SAS/STAT users fre-
quently need to access the novel statistical/visualization tools that have been developed in
the open source R project and not yet been implemented in SAS." I think he's saying it's harder to code R from IML. SAS is trying to improve this.

I ask because I do not have the latest version of PROC IML that is capable of calling R, and at this time I do not know when I will be able to get my hands on the latest version (minimum IML 9.22 is required).

Regards,
Martin.

Anonymous said...

Although I don't know nor use R myself, the Proc_R macro seems a very welcome find, and I set out to test it and find out how I can provide it to our SAS users, some of whom already use R. There seems to be a growing interest in R, so combining SAS and R felt like a good proposition provided there is no extra cost..

My test environment is a 64-bit Windows 7 PC with a local installation of SAS 9.3 and EG 5.1.
R-2.15.3 was installed on the same workstation. Since we're located in Helsinki, Finland, we use localized Windows settings for time and date formatting etc.

Others, too, have noted that in some cases there are problems with the graphics display part.
The error message involves macro variables FGNAME and FGSW.
The solutions that were suggested didn't work for me at all.

Apparently the Windows directory data as provided by the dir command in the statement is not as expected:
FileName MyDir Pipe "dir &_sasdirec /a:-d";
I assume it is the Windows settings for Finland that make the output different from what the code expects.
This may well be the case most of Europe except perhaps Great-Britain.

The code below is just the last part of Proc_R (display R graphics).
The most inportant changes are:
1. the dir command switches;
2. datetime formats in several places;
3. removed some unnecessary code involving a temp variable;
4. scan function detecting filename extension.

Using the amended PROC_R macro I was able to run examples 3 and 4 succesfully (as far as I was able to judge).
I ran most tests in a SAS DM-session, but at least examples 1 & 2 which do not involve graphics, worked from EG as well.

Further testing on our SAS server with R scripts provided by our users is still needed.


*****display R graphics*****;
%let _sasdirec=%bquote("&sasdirec");

FileName MyDir Pipe "dir &_sasdirec /a:-d /-c /t:w /n";
data file;
infile MyDir dsd lrecl=300 pad;
input @1 line $255.;
format line $255. _crtime datetime24.;
crtime=substr(line,1,17);
if trim(left(scan(lowcase(line),-1,'.'))) in ('gif','png','jpeg','jpg','ps') then do;
_crtime=input(crtime,ANYDTDTM17.);
filename=trim(left(scan(line,-1,' ')));
end;
run;

proc sort data=file out=file2; by descending _crtime descending filename;
where trim(left(scan(lowcase(filename),-1,'.'))) in ('gif','png','jpeg','jpg','ps');
run;

data _null_;
set file2;
if _n_=1 then do;
call symput('fgsw',put(input("&beforetm",datetime19.)<(input(crtime,ANYDTDTM17.)+60),best.));
call symput('fgname',trim(left(filename)));
end;
run;
%put $$$$&fgname;


Arjen Raateland
SAS support
Finnish Environment Institute, Helsinki

Anonymous said...

P.S.
One of the examples involves SASHELP.COMPANY which is not available on SAS installations without Enterprise Miner.
However, it is not at all essential what data the data set in the example contains.
SASHELP.COMPANY can be replaced with SASHELP.CLASS.