## Thursday, September 21, 2017

### ggformula: another option for teaching graphics in R to beginners

A previous entry (http://sas-and-r.blogspot.com/2017/07/options-for-teaching-r-to-beginners.htmldescribes an approach to teaching graphics in R that also “get[s] students doing powerful things quickly”, as David Robinson suggested

In this guest blog entry, Randall Pruim offers an alternative way based on a different formula interface. Here's Randall:

For a number of years I and several of my colleagues have been teaching R to beginners using an approach that includes a combination of
• the `lattice` package for graphics,
• several functions from the `stats` package for modeling (e.g., `lm(), t.test()`), and
• the `mosaic` package for numerical summaries and for smoothing over edge cases and inconsistencies in the other two components.
Important in this approach is the syntactic similarity that the following “formula template” brings to all of these operations.

## goal ( y ~ x , data = mydata, ... )

Many data analysis operations can be executed by filling in four pieces of information (goal, y, x, and mydata) with the appropriate information for the desired task. This allows students to become fluent quickly with a powerful, coherent toolkit for data analysis.

As the earlier post noted, the use of `lattice` has some drawbacks. While basic graphs like histograms, boxplots, scatterplots, and quantile-quantile plots are simple to make with `lattice`, it is challenging to combine these simple plots into more complex plots or to plot data from multiple data sources. Splitting data into subgroups and either overlaying with multiple colors or separating into sub-plots (facets) is easy, but the labeling of such plots is not as convenient (and takes more space) than the equivalent plots made with `ggplot2`. And in our experience, students generally find the look of `ggplot2` graphics more appealing.
On the other hand, introducing `ggplot2` into a first course is challenging. The syntax tends to be more verbose, so it takes up more of the limited space on projected images and course handouts. More importantly, the syntax is entirely unrelated to the syntax used for other aspects of the course. For those adopting a “Less Volume, More Creativity” approach, `ggplot2` is tough to justify.
ggformula: The third-and-a half way
Danny Kaplan and I recently introduced `ggformula`, an R package that provides a formula interface to `ggplot2 `graphics. Our hope is that this provides the best aspects of `lattice` (the formula interface and lighter syntax) and `ggplot2` (modularity, layering, and better visual aesthetics).
For simple plots, the only thing that changes is the name of the plotting function. Each of these functions begins with `gf`. Here are two examples, either of which could replace the side-by-side boxplots made with `lattice` in the previous post.
We can even overlay these two types of plots to see how they compare. To do so, we simply place what I call the "then" operator (`%>%`, also commonly called a pipe) between the two layers and adjust the transparency so we can see both where they overlap.

Comparing groups
Groups can be compared either by overlaying multiple groups distinguishable by some attribute (e.g., color)
or by creating multiple plots arranged in a grid rather than overlaying subgroups in the same space. The `ggformula `package provides two ways to create these facets. The first uses `|` very much like `lattice` does. Notice that the `gf_lm()` layer inherits information from the the `gf_points()` layer in these plots, saving some typing when the information is the same in multiple layers.

The second way adds facets with `gf_facet_wrap()` or `gf_facet_grid()` and can be more convenient for complex plots or when customization of facets is desired.
Fitting into the tidyverse work flow
`ggformala` also fits into a tidyverse-style workflow (arguably better than `ggplot2` itself does). Data can be piped into the initial call to a `ggformula` function and there is no need to switch between `%>%` and `+` when moving from data transformations to plot operations.
Summary
The “Less Volume, More Creativity” approach is based on a common formula template that has served well for several years, but the arrival of `ggformula` strengthens this approach by bringing a richer graphical system into reach for beginners without introducing new syntactical structures. The full range of `ggplot2` features and customizations remains available, and the  `ggformula`  package vignettes and tutorials describe these in more detail.
-- Randall Pruim

Jan said...

This is awesome! I just wish I knew about this a week ago (just spent a week teaching my intro stats class a messy conglomerate of tidyverse and mosaic for plotting).

I love that the gf_ functions work with the %>% operator! Unfortunately, other formula using methods from base (like lm) or mosaic (like tally or favstats) do not work with the operator, and need data = ., which can potentially be rather confusing.

Randall Pruim said...

lm() is largely out of our control, and I don't think it is a good idea to write a replacement for lm().

For numerical summaries, take a look at df_stats(). I think you will find it does what you want and interoperates well with ggformula. In particular, it always returns a tidy data frame (hence the d in df_stats). Here is an example:

require(mosaic)
HELPrct %>% filter(sex == "male") %>% df_stats(age ~ substance, mean, median)
## substance mean_age median_age
## 1 alcohol 37.95035 38.0
## 2 cocaine 34.36036 33.0
## 3 heroin 33.05319 32.5

Naga Manickam said...

I have picked cheery a lot of useful clothes outdated of this amazing blog. I’d love to return greater than and over again. Thanks!

Data science course in tambaram | Data Science course in anna nagar
Data Science course in chennai | Data science course in Bangalore
Data Science course in marathahalli | Data Science course in btm

devipriya chitra said...

Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

rpa training in velachery| rpa training in tambaram |rpa training in sholinganallur | rpa training in annanagar| rpa training in kalyannagar

Richa T said...

I have been meaning to write something like this on my website and you have given me an idea. Cheers.
java training in chennai | java training in USA

Anonymous said...

Great post! I am actually getting ready to across this information, It’s very helpful for this blog.Also great with all of the valuable information you have Keep up the good work you are doing well.

angularjs-Training in velachery

angularjs Training in bangalore

angularjs Training in bangalore

angularjs Training in btm

angularjs Training in electronic-city

Revathy said...

I found your blog while searching for the updates, I am happy to be here. Very useful content and also easily understandable providing.. Believe me I did wrote an post about tutorials for beginners with reference of your blog.

angularjs Training in chennai
angularjs Training in chennai

angularjs-Training in tambaram

angularjs-Training in sholinganallur

angularjs-Training in velachery

Meera Kumar said...

Amazing article. Your blog helped me to improve myself in many ways thanks for sharing this kind of wonderful informative blogs in live. I have bookmarked more article from this website. Such a nice blog you are providing ! Kindly Visit Us
R Programming institutes in Chennai | R Programming training in chennai | R Programming training center in chennai

saran latha said...

Really very nice blog information for this one and more technical skills are improve,i like that kind of post.