Showing posts with label open source. Show all posts
Showing posts with label open source. Show all posts

Monday, February 28, 2011

Plug for RStudio: powerful, free, and easy to use interactive development environment for R


(click for a bigger picture)


As a longtime SAS user, one obstacle for me in using R professionally has been figuring out a process for saving and testing code across several work sessions and integrating code composition and execution. There are a couple of integrated R environments available, including ESS, TINN-R, and others. However, each of these seemed to require a serious investment of time, and I never did get around to using them (nor did Nick, despite several good-faith attempts). Instead I used a clunky system of editing code via a text editor, then copy and pasting or sourcing. This really inhibited my ability to at first learn then efficiently code in R.

Then Nick introduced me to the folks who have created RStudio. They are a small group of wicked smart programmers who know how to help other programmers be more efficient. They've now turned their attention to help statisticians and other R users. RStudio, publicly available as of 2/28/2011, is an open source product that is freely available. Its abilities are extremely broad, and I'm bound to miss something important in the brief description below, but suffice it to say that it's well worth your time to check it out. Neither Nick nor I have any vested interest in recommending it (though he's moved all of his teaching of introductory and intermediate statistics courses to it, along with his collaborative research projects).

RStudio is an integrated development environment for R that includes 1) text editing windows from which code can be submitted to the console and/or saved to the OS, 2) live lists of the objects in your workspace, 3) easily searchable infinite history with ability to insert from the history to the console or a text editing window, 4) tab completion in the console for objects, commands, and help, 5) interface with the OS for access to files, 6) help window with back and forward buttons, 7) package downloading, and 8) support for Sweave to facilitate reproducible analysis. Despite all these capabilities, RStudio is very easy to get started with.

There is also a server version, which you can access over the web if someone installs it and gives you access. If you're not familiar with this idea, it means you can work from most browsers--I was even able to use it on a Kindle. The cloud version saves your workspace from session to session, so you can work in exactly the same way, in exactly the same workspace (with a continuous history and all your objects), on whatever OS/CPU you have in front of you-- Windows, Mac OS, Chrome, Linux. You can switch OS, you can shut your computer down, and RStudio comes up just as you left it. Forgot your laptop? No problem.

The standalone version is an ordinary downloadable program. It uses the existing R binaries on your Mac (OSX 10.5+), Windows (XP/Vista/7), Ubuntu or Fedora Linux machine. The local and server applications have the same interface.

For me, the most useful aspect has been the integrated editor, but each one of the items I listed above has saved me a great deal of time over the past few months. The integrated help alone might be reason enough to adopt it. As a consulting statistician, RStudio is a huge leap forward. It changes R from a important tool which I have to be able to use into a plausible system in which to do all of my work. I really can't overestimate its value to me. Go to http://www.rstudio.org/ to learn more, see screenshots, and download!

Monday, August 24, 2009

packages and CRANtastic

Additional functionality in R is added through packages, which consist of libraries of bundled functions, datasets, examples and help files that can be downloaded from CRAN (the Comprehensive R Archive Network). The function install.packages() or the windowing interface under Packages and Data (Mac) or Packages (Windows) are used to download and install packages (see section B.6.1, p. 273).

As of August, 2009, there were 1,907 packages on CRAN, up from 1,705 in March 2009 (see here for the current list). While each of these has met a minimal standard for inclusion, it is important to keep in mind that packages within R are typically created by individuals or small groups, and not endorsed by the R core group. As a result, they do not necessarily undergo the same level of testing and quality assurance that the core R system does.

CRANtastic is a free, open-source web-application that allows users to search for, review and tag CRAN packages. It was created by Hadley Wickham and is currently being developed by Bjørn Mæland. It can help you learn more about a package than it's inclusion on CRAN allows.

As an example, consider the entry for the plyr package. This is a set of tools written by Hadley that solves a common set of problems: you need to break a big problem down into manageable pieces, operate on each pieces and then put all the pieces back together. For example, you might want to fit a model to each spatial location or time point in your study, summarise data by panels or collapse high-dimensional arrays to simpler summary statistics. The CRANtastic entry provides detailed release information, author and maintainer, mentions that (as of August 23, 2009) 7 people have noted that they use it, lists 4 ratings received overall (5 stars), with 3 ratings for documentation (5 stars). A user named eamani provided a review. A search for related packages, dependencies and reverse depends is also included.

While still new, with relatively few users, this website has great potential to help provide some guidance about packages. If it takes off as an active community, this could help provide a map to particularly useful routines to utilize within R.