## Project MOSAIC migrates to ggformula

####
*guest entry by Randall Pruim*

In 2017, Project MOSAIC announced `ggformula`

, a new package that provides a formula interface to `ggplot2`

graphics in R. (See, for example, *ggformula: another option for teaching graphics in R to beginners*.) This package provides a happy medium between

`lattice`

and `ggplot2`

that allows beginners to “do powerful things quickly” by adopting the formula interface of `lattice`

and R’s statistical modeling functions as a means to produce `ggplot2`

graphics.Over the past year, our experience with

`ggformula`

in our classes and in faculty development workshops together with the feedback we have received from other users have demonstrated `ggformula`

to be flexible, yet easy to learn. As part of an ecosystem that emphasizes a formula interface of `lattice`

and the core R statistical modeling functions early on and adds `tidyverse`

concepts later, `ggformula`

fits better with the rest of our toolkit than do either `lattice`

or `ggplot2`

, providing opportunities for more creativity with less volume.The recent releases of several Project MOSAIC R packages (

`mosaic`

, `mosaicData`

, `mosaicCore`

, and `ggformula`

) and the related `fastR2`

package mark the official migration of Project MOSAIC from `lattice`

to `ggformula`

as its primary graphics system. Future development includes plans to release an updated version of `mosaicModel`

which will interoperate with `ggformula`

and a new package called `ggformulaExtra`

(currently only available via Github) which adds additional functionality but relies on additional packages beyond `ggplot2`

.Many of the recent changes to the Project MOSAIC suite of packages will go largely unnoticed by most users but were necesary to allow

`ggformula`

to interoperate with the newest version of `ggplot2`

. Among the small number of more noticeable changes are a change in `gf_smooth()`

so that it no longer displays confidence bands by default (use `se = TRUE`

to turn them on), expanded support for “rugs”, support for horizontal versions of histograms, boxplots, and violin plots (using the ggstance package), and the addition of `gf_sf()`

for improved support for choropleth maps (based on the new `geom_sf()`

in `ggplot2`

). Along the way, we also did some light housekeeping (improving documentation, etc.) and migrated most of our package examples from `lattice`

to `ggformula`

.The basic form of the formula interface is

`goal(y ~ x, data = myData)`

```
```

which corresponds to SAS code like`PROC GOAL DATA = MYDATA; MODEL Y = X; RUN;`

```
```

`goal()`

can be replaced by a graphing (e.g., `gf_point()`

) or modeling (e.g., `lm()`

) function with the number of variables involved in the formula varying with the complexity of the plot or model desired.```
library(mosaic) # load the mosaic package (and ggformula)
gf_point(length ~ width, data = KidsFeet) # scatter plot
lm(length ~ width, data = KidsFeet) %>% msummary() # linear model
```

```
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.8172 2.9381 3.341 0.00192 **
## width 1.6576 0.3262 5.081 1.1e-05 ***
##
## Residual standard error: 1.025 on 37 degrees of freedom
## Multiple R-squared: 0.411, Adjusted R-squared: 0.3951
## F-statistic: 25.82 on 1 and 37 DF, p-value: 1.097e-05
```

```
```

Users of `lattice`

-based Project MOSAIC materials should have little trouble migrating to `ggformula`

since the types of plots that were easiest to construct with `lattice`

can be created very similarly using `ggformula`

. For example, the following two commands are essentially equivalent (although the resulting plots have a different appearence).```
histogram( ~ age | sex, data = HELPrct, width = 2, col = "navy")
gf_dhistogram( ~ age | sex, data = HELPrct, binwidth = 2, fill = "navy")
```

```
```

It is much simpler, however, to create complex plots using

`ggformula`

because multiple layers can be stacked using the `maggrittr`

pipe (`%>%`

, which we often read as “then”) familiar to users of the `tidyverse`

suite of packages (and many others as well).```
gf_jitter(Sepal.Length ~ Sepal.Width, data = iris, color = ~ Species) %>%
gf_density2d(alpha = 0.4) %>%
gf_jitter(geom = "rug", alpha = 0.7) %>%
gf_lm(linetype = "dashed") %>%
gf_refine(scale_color_brewer(type = "qual"))
```

```
```

`ggformula`

, a number of related resources have been or are being converted from `lattice`

to `ggformula`

as well. These include companion volumes for several popular statistics text books, our series of “Little Books”, the *Minimal R Vignette*, and a side-by-side comparison of

`lattice`

and `ggformula`

. In addition, the second edition of *Foundations and Applications of Statistics*(Pruim, 2018) uses

`ggformula`

throughout.An eventual migration from

`ggformula`

to native `ggplot2`

, while not strictly necessary (since the same plots can be made in either system), is easier than the migration from `lattice`

since the underlying grammar and much of the nomenclature of `ggformula`

is borrowed from `ggplot2`

. In the meantime, equivalent `ggformula`

code is generally less verbose and simpler for novices to understand and produce. And the use of `%>%`

for layering avoids the errors that creap in when moving between `tidyverse`

, which also uses `%>%`

, and `ggplot2`

which uses `+`

. Indeed, data flows can be directed seamlessly into `ggformula`

plotting commands. This can be useful as a debugging step when creating data pipelines or as a way to create a plot for which there is no need to save the pre-processed data.```
Galton %>%
filter(sex == "M") %>% # select only male adult children
group_by(family) %>% #
sample_n(1) %>% # choose only one male from each family
ungroup %>% #
mutate( # compute z-scores for parents' heights
zfather = round(mosaic::zscore(father), 2),
zmother = round(mosaic::zscore(mother), 2)
) %>%
gf_jitter(zfather ~ zmother, alpha = 0.5,
title = "Standardized heights of parents",
caption = "Source: Galton") %>%
gf_lm()
```

It has been over a year since I have used either

`lattice`

or `ggplot2`

for anything other than comparison examples. My co-authors and I have found the switch from `lattice`

to `ggformula`

to be both straightforward (for us) and advantageous (for our students). We encourage you to give it a try in your own work and with your students.
## 13 comments:

Post a Comment