Sunday, January 17, 2016

Set up RStudio in the cloud to work with GitHub

I love GitHub for version control and collaboration, though I'm no master of it. And the tools for integrating git and GitHub with RStudio are just amazing boons to productivity.

Unfortunately, my University-supplied computer does not play well with GitHub. Various directories are locked down, and I can't push or pull to GitHub directly from RStudio. I can't even use install_github() from the devtools package, which is needed for loading Shiny applications up to Shinyapps.io. I lived with this for a bit, using git from the desktop and rsconnect from a home computer. But what a PIA.

Then I remembered I know how to put RStudio in the cloud-- why not install R there, and make that be my GitHub solution?

It works great. The steps are below. In setting it up, I discovered that Digital Ocean has changed their set-up a little bit, so I update the earlier post as well.

1. Go to Digital Ocean and sign up for an account. By using this link, you will get a $10 credit. (Full disclosure: I will also get a $25 credit once you spend $25 real dollars there.) The reason to use this provider is that they have a system ready to run with Docker already built in, which makes it easy. In addition, their prices are quite reasonable. You will need to use a credit card or PayPal to activate your account, but you can play for a long time with your $10 credit-- the cheapest machine is $.007 per hour, up to a $5 per month maximum.

2. On your Digital Ocean page, click "Create droplet". Click on "One-click Apps" and select "Docker (1.9.1 on 14.04)". (The numbers in the parentheses are the Docker and Ubuntu version, and might change over time.) Then a size (meaning cost/power) of machine and the region closest to you. You can ignore the settings. Give your new computer an arbitrary name. Then click "Create Droplet" at the bottom of the page.

3. It takes a few seconds for the droplet to spin up. Then you should see your droplet dashboard. If not, click "Droplets" from the top bar. Under "More", click "Access Console". This brings up a virtual terminal to your cloud computer. Log in (your username is root) using the password that digital ocean sent you when the droplet spun up.

4. Start your RStudio container by typing: docker run -d -p 8787:8787 -e ROOT=TRUE rocker/hadleyverse

You can replace hadleyverse with rstudio if you like, for a quicker first-time installation, but many R users will want enough of Hadley Wickham's packages that it makes sense to install this version. The -e ROOT=TRUE is crucial for our approach to installing git into the container, but see the comment below from Petr Simicek below for another way to do the same thing.

5. Log in to your Cloud-based RStudio. Find the IP address of your cloud computer on the droplet dashboard, and append :8787 to it, and just put it into your browser. For example: http://135.104.92.185:8787. Log in as user rstudio with password rstudio.

6. Install git, inside the Docker container. Inside RStudio, click Tools -> Shell.... Note: you have to use this shell, it's not the same as using the droplet terminal. Type: sudo apt-get update and then sudo apt-get install git-core to install git.

git likes to know who you are. To set git up, from the same shell prompt, type git config --global user.name "Your Handle" and git config --global user.email "an.email@somewhere.edu"

7. Close the shell, and in RStudio, set things up to work with GitHub:  Go to Tools -> Global Options -> Git/SVN.  Click on create RSA key.  You don't need a name for it.  Create it, close the window, then view it and copy it.



8. Open GitHub, go to your Profile, click "Edit Profile", "SSH keys".  Click "Add key", and just paste in the stuff you copied from RStudio in the previous step.




You're done!

To clone an existing repos from Github to your cloud machine, open a new project in RStudio, and select Version Control, then Git, and paste in the URL name that GitHub provides.  Then work away!




An unrelated note about aggregators: We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, other than as mentioned above, the aggregator is violating the terms by which we publish our work.

3 comments:

Petr Simecek said...

Interesting, I should try it. There is yet another way if you (for security reason) prefer not to give RStudio root privileges: You can log into your running Docker container.

1) Log into your DO machine and run 'docker ps' to get your docker container ID
2) docker exec -i -t YOUR_CONTAINER_ID /bin/bash
3) Your are now inside your container, so you can install git or - for example - add another RStudio user

Ken Kleinman said...

Thanks, Petr! That's helpful.

You can enter Petr's command into the Droplet console, and this would allow you to omit the "-e ROOT=TRUE" from your docker run command, and avoid the system/shell series of commands from within RStudio. That's probably a more efficient way to do it.

Unknown said...
This comment has been removed by the author.