Monday, February 13, 2012

RStudio in the cloud, for dummies

You can have your own cloud computing version of R, complete with RStudio. Why should you? It's cool! Plus, there's a lot more power out there than you can easily get on your own hardware. And, it's R in a web page. Run it from your tablet. Run it from work, even if you're not supposed to install software. Run it from your boyfriend's laptop while he's on a beer run.

This entry is largely made possible by the work of Louis Alsett, who's completing his doctoral work at Trinity College, University of Dublin. We had thought that running a cloud compute application was beyond our current technical abilities, but Louis' work makes it pretty easy to do. In this entry, we'll show you how. (Louis graciously vetted the text for this entry, but all errors are our responsibility).

Start-up
1. Get an account with Amazon Web Services (AWS). This is slightly more involved than your ordinary Amazon account, but not a big deal. There are no fees unless you use the services. There's also a "free tier" which means that you start with 750 hours of usage per month for a year.* Effectively, they're giving you a free computer for a year.

2. Go to this handy page maintained by Louis. Click on the 32-bit link for your region. This is a shortcut that gets you the right AMI. (An AMI is an "Amazon Machine Image". Each of these is effectively an operating system with a bunch of pre-loaded software. The ones that Louis maintains have R and RStudio built into them, and have an additional feature we'll encounter later.) You can also find Louis' AMIs without his page.**

3. The shortcut from Louis' page brings you through the first steps of setting up. You'll next see a page in the "Request Instances Wizard". An "instance" consists of some virtual hardware for your chosen OS and software. It's effectively a computer in the cloud. The defaults on the wizard are fine, with one key exception, but we'll add a little detail about what they are.

a. Click Continue on that first page, as if you had reviewed the data. (You might notice that this is a Ubuntu OS. But if you're not a Linux user, don't fret-- you won't know that's what OS is running.)
b. On the "Instance details" page, you can also click continue. The main option here is choosing the virtual hardware the AMI will run on. (The defaults are fine.)
c. The next page is also "Instance details" and you can click through.
d. The next "Instance details" page lets you assign a name to the instance. This can be useful if you end up running several instances at the same time, but you can click through for now.
e. Click through the "Create Key Pair" page; this is also convenient if you're a heavy user, but not necessary.
f. The next step is to "Configure Firewall". This is where you do have to pay attention. Since you'll want to access your virtual machine via a browser, you need to allow HTTP access. To do this,

1) click "Create a New Security Group".
2) Give a name (like "RStudio") and
3) a description ("RStudio")-- both are required. Then,
4) in the "Port range" window, type 80. (Leave the source at 0.0.0.0/0, which means that you can connect from any IP address.)
5) Click "Add Rule". You should get a little blue box describing the rule. Now
6) click continue. On the following page, click "Back" and check that your new security group is selected.

g. Click "Launch". Your virtual machine is being started! There's a page with some links, which you can "Close".

Use
4. To use the new computer, click "Instances" on the "Navigation" panel of the AWS Management Console Amazon EC2 page. You'll see a row with an "empty" Name, and a State that is either "Pending" or "Running". (You might need to click refresh to see when it starts running.) When it's running, click on it. You get a bunch of information in the box below.

5. Scroll down to "Public DNS". Copy the DNS and paste it into the address bar in your browser. If all went well, you should see an RStudio login window. This is the genius of Louis' approach-- you never need to see the operating system. Use the username rstudio and password rstudio. In a moment, that beautiful, familiar RStudio interface appears!

6. For security, it makes sense to change your password. But since Louis wants to spare you the OS, he's cleverly built in a way to change it from within RStudio. Just change the "Password" in the "Welcome.r" file, then source it. You should probably avoid saving the "Welcome.r" file-- maybe just close it-- because saving it will result in your password being saved as plain text. Probably not a big risk, but why tempt fate?

7. You can close your browser and open the window again any time you like, from any browser you like, using your new password.

There's your R in the cloud! Use RStudio's built-in package installation tools to easily build your working environment.

Management
Our understanding of the "Free Usage Tier" is that you can leave this on all the time for a year without incurring any charges. Amazing. But caveat emptor.
You should also know how billing works. According to the FAQ, for instances other than the "micro" version we used here, (or for "micro" instances after your "Free Usage Tier" period is over) you're billed an hourly rate between when the instance starts running and when it's terminated. The "micro" linux instance that we chose above will cost $0.02/hour after the free period is over. Still cheap to use for a few hours, but too costly to leave on all the time for fun.

However, there is also a "stopped" state. The stopped state is important for the other aspect of billing: data storage. Storage costs something like $0.10 per GB per month. When your instance is "stopped" you don't pay the hourly instance charge, but you still pay the monthly data storage charge. The "free usage tier" includes 30GB of storage for the first year. (Obviously, there are no charges when your instance is terminated, since you lose all stored data.) Louis' AMIs have only 2 GB of storage built in, so they will run cheap, once your free usage period is over.

As long as an instance is running, you retain all aspects of your session-- it's just as if you had a computer that you left on running RStudio all the time. An instance that is stopped will retain all the loaded packages and local objects, but you have to log into AWS to start it.

Amazon warns that instances will occasionally fail, and if that happens, you're supposed to be able to restart them, as if you had stopped them on purpose.*** But it might be good idea to back things up.

Happy cloud computing!

* The free usage is limited to "micro" instances, such as we use here. For any other kind, the usual fees apply.

** To find the AMIs without Louis' handy web page, start the AWS management console, go to EC2, and click "Launch Instance". You'll get a page with some standard instances where you can click a radio button to "Launch Classic Wizard". Click that, then Community AMIs. Then search for rstudio, click the one with the RStudio and R version you want, and proceed as from step 2.

*** Louis says " I've not experienced it first-hand as they've been reliable for me, but apparently the instance will disappear and the hard drive will be left hanging round. When you are in the Amazon console on the EC2 tab, if you look further down the left "Navigation" you'll see "Volumes" under the "Elastic Block Store" section. You can look there when your instance is running and see its hard drive which will say "attached" -- this becomes "available" if an instance fails. So, you need to create a new instance and then attach the drive to it and reboot."

8 comments:

Christopher Peters said...

Fantastic, I'm up and running!

Rebecca Wilde said...

Simple,Clear and effective. Great post.
I will pass this on to others.

Thanks!
Rebecca

Anonymous said...

Great - how do you integrate your own data?

Best


Michael

Nick Horton said...

You can upload data from the Files tab on the bottom right.

More info on RStudio can be found, including their screenshots and quick introduction, on their website.

Anonymous said...

Hi Nick,

thanks!

>You can upload data from the Files >tab on the bottom right.
You mean the files tab of
of RStudio that runs in the browser?

From "open file" I cannot reach my own HD, and load() does not seem to work.


Michael

Unknown said...

I was able to run R on AWS using your directions. Your directions were very clear and helpful!! I have been having issues starting up and using R-Shiny from R on AWS. When I try to get started I get this error:

> ~/shinyapp
Error: unexpected '/' in "~/"
> |-- ui.R
Error: unexpected '|' in "|"
> |-- server.R
Error: unexpected '|' in "|"

Do you have any suggestions to help?
Thank you for creating an easy way to use R on AWS!!
Breanna

Ken Kleinman said...

Glad you found this clear and helpful, Breanna.

I haven't yet tried to run Shiny on one of these, unfortunately. We'll have to run a follow-up to demonstrate that.

(BTW, your R lines look odd to me. Did you mean to do a setwd() and save the files from the demo Shiny?)

®γσ, ξηg(雷欧) said...

I tried to use puTTy for installation of Java but failure, any solution?
https://www.facebook.com/photo.php?fbid=1634226790147795&set=p.1634226790147795&type=1