Thursday, September 21, 2017

ggformula: another option for teaching graphics in R to beginners

A previous entry ( an approach to teaching graphics in R that also “get[s] students doing powerful things quickly”, as David Robinson suggested

In this guest blog entry, Randall Pruim offers an alternative way based on a different formula interface. Here's Randall: 

For a number of years I and several of my colleagues have been teaching R to beginners using an approach that includes a combination of
  • the lattice package for graphics,
  • several functions from the stats package for modeling (e.g., lm(), t.test()), and
  • the mosaic package for numerical summaries and for smoothing over edge cases and inconsistencies in the other two components.
Important in this approach is the syntactic similarity that the following “formula template” brings to all of these operations.  

    goal ( y ~ x , data = mydata, ... )

Many data analysis operations can be executed by filling in four pieces of information (goal, y, x, and mydata) with the appropriate information for the desired task. This allows students to become fluent quickly with a powerful, coherent toolkit for data analysis.

Trouble in paradise
As the earlier post noted, the use of lattice has some drawbacks. While basic graphs like histograms, boxplots, scatterplots, and quantile-quantile plots are simple to make with lattice, it is challenging to combine these simple plots into more complex plots or to plot data from multiple data sources. Splitting data into subgroups and either overlaying with multiple colors or separating into sub-plots (facets) is easy, but the labeling of such plots is not as convenient (and takes more space) than the equivalent plots made with ggplot2. And in our experience, students generally find the look of ggplot2 graphics more appealing.
On the other hand, introducing ggplot2 into a first course is challenging. The syntax tends to be more verbose, so it takes up more of the limited space on projected images and course handouts. More importantly, the syntax is entirely unrelated to the syntax used for other aspects of the course. For those adopting a “Less Volume, More Creativity” approach, ggplot2 is tough to justify.
ggformula: The third-and-a half way
Danny Kaplan and I recently introduced ggformula, an R package that provides a formula interface to ggplot2 graphics. Our hope is that this provides the best aspects of lattice (the formula interface and lighter syntax) and ggplot2 (modularity, layering, and better visual aesthetics).
For simple plots, the only thing that changes is the name of the plotting function. Each of these functions begins with gf. Here are two examples, either of which could replace the side-by-side boxplots made with lattice in the previous post.
We can even overlay these two types of plots to see how they compare. To do so, we simply place what I call the "then" operator (%>%, also commonly called a pipe) between the two layers and adjust the transparency so we can see both where they overlap.

Comparing groups
Groups can be compared either by overlaying multiple groups distinguishable by some attribute (e.g., color)
or by creating multiple plots arranged in a grid rather than overlaying subgroups in the same space. The ggformula package provides two ways to create these facets. The first uses | very much like lattice does. Notice that the gf_lm() layer inherits information from the the gf_points() layer in these plots, saving some typing when the information is the same in multiple layers.

The second way adds facets with gf_facet_wrap() or gf_facet_grid() and can be more convenient for complex plots or when customization of facets is desired.
Fitting into the tidyverse work flow
ggformala also fits into a tidyverse-style workflow (arguably better than ggplot2 itself does). Data can be piped into the initial call to a ggformula function and there is no need to switch between %>% and + when moving from data transformations to plot operations.
The “Less Volume, More Creativity” approach is based on a common formula template that has served well for several years, but the arrival of ggformula strengthens this approach by bringing a richer graphical system into reach for beginners without introducing new syntactical structures. The full range of ggplot2 features and customizations remains available, and the  ggformula  package vignettes and tutorials describe these in more detail.
-- Randall Pruim


Jan said...

This is awesome! I just wish I knew about this a week ago (just spent a week teaching my intro stats class a messy conglomerate of tidyverse and mosaic for plotting).

I love that the gf_ functions work with the %>% operator! Unfortunately, other formula using methods from base (like lm) or mosaic (like tally or favstats) do not work with the operator, and need data = ., which can potentially be rather confusing.

Randall Pruim said...

lm() is largely out of our control, and I don't think it is a good idea to write a replacement for lm().

For numerical summaries, take a look at df_stats(). I think you will find it does what you want and interoperates well with ggformula. In particular, it always returns a tidy data frame (hence the d in df_stats). Here is an example:

HELPrct %>% filter(sex == "male") %>% df_stats(age ~ substance, mean, median)
## substance mean_age median_age
## 1 alcohol 37.95035 38.0
## 2 cocaine 34.36036 33.0
## 3 heroin 33.05319 32.5

logistic-solutions said...

Useful Information, your blog is sharing unique information....
Thanks for sharing!!!
SAP Consulting Services in usa
sas value added reseller

logistic-solutions said...

Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site.
microsoft installation and configuration services

Anuj Singh said...

Nice blog and absolutely outstanding. You can do something much better but i still say this perfect.Keep trying for the best. Hire R Programmers

Unknown said...

Best Web Development Services are Available at reduced cost.

Raj Noida said...

Nice blog Content. It is very informative and helpful. Please share more content. Thanks.
SAS Training in Noida
SAS Course in Noida
SAS Institute in Noida

Kritika said...

Great post. Thank you for sharing such useful information. Please keep sharing
SAS Training in Delhi

logistic-solutions said...

Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site.
mbe supplier in software services

prince arora said...

This is the very nice and informative blog i would like to share this . thanks for sharing such like of information are you looking for R language course

Phạm Văn Hưng said...

Cảm ơn bạn đã chia sẻ bài viết – xin cho phép chèn thông tin. dịch vụ làm phù hiệu xe tải nhanh nhất |dịch vụ làm phù hiệu xe container nhanh nhất | Lắp Định vị gps xe tải hợp chuẩn

logistic-solutions said...

Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site.
Minority Business Software Reseller

Techtoolsinnovation said...

Thank you so much for sharing such an awesome blog...
sas certified advanced analytics professional
asa academic data science training

Arya Bhatt said...

This really has covered a great insight on SAS Analytics . I found myself lucky to visit your page and came across this insightful read on SAS tutorial. Please allow me to share similar work on SAS training course:-

carney brock said...

Thank you so much for sharing this. I appreciate your efforts on making this collection.
Web development in Canada
Web development in Toronto

rajkamal said...

GREENS TECHNOLOGY in ADYAR offers best software training and placement exclusively on Oracle, Selenium, Amazon Web Services(AWS), Data Warehouse, Java, Sharepoint, Software Testing, Informatica, Blockchain, Dot Net, Oracle DBA, Hadoop, SAS, R Language, Tableau, Power BI, Xamarin, Node.js, ReactJS, UNIX SHELL Scripting, C and C++, and more to the students.

Unknown said...

Best Web Development Services are Available at reduced cost.