Monday, July 25, 2011

Really useful R package: sas7bdat

For SAS users, one hassle in trying things in R, let alone migrating, is the difficulty of getting data out of SAS and into R. In our book (section 1.2.2) and in a blog entry we've covered getting data out of SAS native data sets. Unfortunately, for all of these methods, you need a working, licensed version of SAS.

However Matt Shotwell has reverse-engineered the sas7bdat file format. This means that you can now read a SAS data set without a working copy of SAS. This is a wonderful thing, and in fact SAS Institute ought to have provided this ability long ago. The package is experimental, but it worked fine for two small data sets. Matt tells me that as of 7/2011, the package only works for sas7bdat files generated on 32-bit Windows systems.

R
Install the package sas7bdat. The use the read.sas7bdat() function.

library(sas7bdat)
helpfromSAS = read.sas7bdat("http://www.math.smith.edu
/sasr/datasets/help.sas7bdat")

(Note that newlines are not allowed in the URL in practice, but formatting for the blog required it.)

> is.data.frame(helpfromSAS)
[1] TRUE
> summary(helpfromSAS$MCS)
Min. 1st Qu. Median Mean 3rd Qu. Max.
6.763 21.680 28.600 31.680 40.940 62.180
> with(helpfromSAS, summary(SUBSTANCE))
alcohol cocaine heroin
177 152 124

It's unclear why all the variable names are all capitalized. That didn't happen in another trial, so it must be something about the way the help.sas7bdat data set was constructed.

12 comments:

Ken said...

SAS provides transport format which does have a public format http://support.sas.com/techsup/technote/ts140.html

What they should do, is to include as an option in PROC EXPORT and make life easier.

Ken Kleinman said...

Wouldn't you need a running version of SAS to use this method?

The beauty of the R package is that you can get the data out of the SAS format _without_ having to be able to run proc export. Try it out, and you'll see what I mean.

Arsenio said...

I have been using sas7bdat ever since it was published. Amazing work!
XPT files are a hassle in SAS and the ability to directly read sas7bdat is priceless. So far i haven't seen mistakes in my datasets 100 - 200 mb in size. One caution though, it might be a bit slow 2-3 mins approx to read that sas7bdat, but the advantages far outweigh the cons.
And we all know that SAS folks are not the fastest movers in the world :P

syeds said...

ken , i have used it without SAS.


Sample Analysis

Rick Wicklin said...

For those who do have SAS and use SAS/IML, the IML language provides convenient functions for importing and exporting SAS data sets to/from R data frames, and SAS/IML matrices to/from R matrices. You can also call R directly from your SAS/IML program, pass parameters to R, and, in general, combine the two languages. Details at http://support.sas.com/documentation/cdl/en/imlug/64248/HTML/default/viewer.htm#r_toc.htm

Alan Churchill said...

SAS provides an ODBC and OleDB driver for free which works with sas7bdat. Basically, you don't have to have a reader: SAS gives it away. A writer is a whole different ballgame. What Matt did is cool but he is only started.

I know the sas7bdat layout and can read and write it in binary. I have .NET code that handles it.

It is an absolute bear to work with and Matt has scratched the surface. That is certainly not to take away what he has done, but his journey is long and writing is a helluva lot harder.

CozyRoc can read/write sas7bdat and has a connector to SQL Server for those interested.

me said...
This comment has been removed by the author.
Anonymous said...

Hi,guys. I think this package is great.But would someone let me know how to install this package? I am using R 2.11, but when I tried to install this package from within R, there does not exist such a library called "sas7bdat". I did not find some useful information when I googled "sas7bdat R package" either. So,could anyone let me know how you installed this package? Thanks a lot!

Ken Kleinman said...

It should install just like any other package. You might need a newer version of R, though.

Anonymous said...

Thanks,Ken. I will that!

Anonymous said...

may i know
what is the meaning of .sas7bdat extension?

Ken Kleinman said...

That's the standard extension for SAS formatted data sets made with SAS versions 7 through at leasy 9.3.