Monday, August 27, 2018

Project MOSAIC migrates to ggformula

Project MOSAIC migrates to ggformula

guest entry by Randall Pruim

In 2017, Project MOSAIC announced ggformula, a new package that provides a formula interface to ggplot2 graphics in R. (See, for example, ggformula: another option for teaching graphics in R to beginners.) This package provides a happy medium between lattice and ggplot2 that allows beginners to “do powerful things quickly” by adopting the formula interface of lattice and R’s statistical modeling functions as a means to produce ggplot2 graphics.

Over the past year, our experience with ggformula in our classes and in faculty development workshops together with the feedback we have received from other users have demonstrated ggformula to be flexible, yet easy to learn. As part of an ecosystem that emphasizes a formula interface of lattice and the core R statistical modeling functions early on and adds tidyverse concepts later, ggformula fits better with the rest of our toolkit than do either lattice or ggplot2, providing opportunities for more creativity with less volume.

The recent releases of several Project MOSAIC R packages (mosaic, mosaicData, mosaicCore, and ggformula) and the related fastR2 package mark the official migration of Project MOSAIC from lattice to ggformula as its primary graphics system. Future development includes plans to release an updated version of mosaicModel which will interoperate with ggformula and a new package called ggformulaExtra (currently only available via Github) which adds additional functionality but relies on additional packages beyond ggplot2.

Many of the recent changes to the Project MOSAIC suite of packages will go largely unnoticed by most users but were necesary to allow ggformula to interoperate with the newest version of ggplot2. Among the small number of more noticeable changes are a change in gf_smooth() so that it no longer displays confidence bands by default (use se = TRUE to turn them on), expanded support for “rugs”, support for horizontal versions of histograms, boxplots, and violin plots (using the ggstance package), and the addition of gf_sf() for improved support for choropleth maps (based on the new geom_sf() in ggplot2). Along the way, we also did some light housekeeping (improving documentation, etc.) and migrated most of our package examples from lattice to ggformula.

The basic form of the formula interface is

goal(y ~ x, data = myData)

which corresponds to SAS code like


goal() can be replaced by a graphing (e.g., gf_point()) or modeling (e.g., lm()) function with the number of variables involved in the formula varying with the complexity of the plot or model desired.

library(mosaic)              # load the mosaic package (and ggformula)
gf_point(length ~ width, data = KidsFeet)                  # scatter plot 
      lm(length ~ width, data = KidsFeet) %>% msummary()   # linear model
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.8172     2.9381   3.341  0.00192 ** 
## width         1.6576     0.3262   5.081  1.1e-05 ***
## Residual standard error: 1.025 on 37 degrees of freedom
## Multiple R-squared:  0.411,  Adjusted R-squared:  0.3951 
## F-statistic: 25.82 on 1 and 37 DF,  p-value: 1.097e-05

Users of lattice-based Project MOSAIC materials should have little trouble migrating to ggformula since the types of plots that were easiest to construct with lattice can be created very similarly using ggformula. For example, the following two commands are essentially equivalent (although the resulting plots have a different appearence).

    histogram( ~ age | sex, data = HELPrct,    width = 2, col  = "navy")
gf_dhistogram( ~ age | sex, data = HELPrct, binwidth = 2, fill = "navy")

It is much simpler, however, to create complex plots using ggformula because multiple layers can be stacked using the maggrittr pipe (%>%, which we often read as “then”) familiar to users of the tidyverse suite of packages (and many others as well).

gf_jitter(Sepal.Length ~ Sepal.Width, data = iris, color = ~ Species) %>%
  gf_density2d(alpha = 0.4) %>%
  gf_jitter(geom = "rug", alpha = 0.7) %>%
  gf_lm(linetype = "dashed") %>%
  gf_refine(scale_color_brewer(type = "qual"))

As part of the migration to ggformula, a number of related resources have been or are being converted from lattice to ggformula as well. These include companion volumes for several popular statistics text books, our series of “Little Books”, the Minimal R Vignette, and a side-by-side comparison of lattice and ggformula. In addition, the second edition of Foundations and Applications of Statistics (Pruim, 2018) uses ggformula throughout.

An eventual migration from ggformula to native ggplot2, while not strictly necessary (since the same plots can be made in either system), is easier than the migration from lattice since the underlying grammar and much of the nomenclature of ggformula is borrowed from ggplot2. In the meantime, equivalent ggformula code is generally less verbose and simpler for novices to understand and produce. And the use of %>% for layering avoids the errors that creap in when moving between tidyverse, which also uses %>%, and ggplot2 which uses +. Indeed, data flows can be directed seamlessly into ggformula plotting commands. This can be useful as a debugging step when creating data pipelines or as a way to create a plot for which there is no need to save the pre-processed data.

Galton %>%
  filter(sex == "M") %>%  # select only male adult children
  group_by(family) %>%      #
  sample_n(1) %>%           # choose only one male from each family
  ungroup %>%               #
  mutate(                     # compute z-scores for parents' heights
    zfather = round(mosaic::zscore(father), 2),
    zmother = round(mosaic::zscore(mother), 2)
  ) %>% 
  gf_jitter(zfather ~ zmother, alpha = 0.5, 
            title = "Standardized heights of parents",
            caption = "Source: Galton") %>%

It has been over a year since I have used either lattice or ggplot2 for anything other than comparison examples. My co-authors and I have found the switch from lattice to ggformula to be both straightforward (for us) and advantageous (for our students). We encourage you to give it a try in your own work and with your students.


Clinnovo said...

it is very use full blog and very important information about SAS.
Online SAS Training

logistic-solutions said...

Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site.
sap mbe reseller
sap csr

Roja Priya said...

Thank you for sharing your article. Great efforts put it to find the list of articles which is very useful to know, Definitely will share the same to other forums.
Data Science Training in chennai at Credo Systemz | data science course fees in chennai | data science course in chennai quora | data science with python training in chennai

Meera Kumar said...

Amazing article. Your blog helped me to improve myself in many ways thanks for sharing this kind of wonderful informative blogs in live. I have bookmarked more article from this website. Such a nice blog you are providing ! Kindly Visit Us R Programming institutes in Chennai | R Programming Training in Chennai | R Programming Course Fees | R Language training in Chennai

Facemyresume 7 said...

Useful Information, your blog is sharing unique information....
Thanks for sharing!!!
digital staffing solutions
employee recruitment services
video resume creating services
create professional video resume

kavinilavu G said...

It's really a nice experience to read your post. Thank you for sharing this useful information. If you are looking for more about | Big data course fees | hadoop training in chennai velachery | hadoop training course fees in chennai | Hadoop Training in Chennai Omr

Aman CSE said...

Thanks For sharing such a wonderful Blog on RPA. This blog contains so much of data about RPA that anyone who is searching for RPA, its really helpful for them to grab this data from your blog on RPA. Again thank you so much for your blog on RPA.
Thanks and Regards,
blue prism training in chennai
Best blue prism training in chennai
blue prism training cost in chennai

Mayora said...

Nice article and website. You may want to read this:
Science Reveals 7 Hobbies That Make You Smarter

logistic-solutions said...

Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site.
sas consulting services in usa
sap csr services

amsa leka said...

Thanks for such a great article here. I was searching for something like this for quite a long time and at last I’ve found it on your blog. It was definitely interesting for me to read about their market situation nowadays. Well written article Thank You for Sharing with Us pmp training fee | project management training certification | project management training in chennai | project management courses in chennai | project management certification online |

logistic-solutions said...

Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site.
SAP reseller
Software Reselling services
mbe supplier in software services
mbe software reseller services
bmc software vendors
MBE Oracle Reseller
Software reseller services in north america
sap minority reseller

Nikhil John said...

It's a really nice experience to read your article. which is containing full information about SAS(Statistical Analysis System) especially by Predictive modeling approach. To learn the Statistical Analysis System

mazhar said...

Each bitcoin atm card has its own wallet address, you can send bitcoin to the Specific wallet address and money will be credited as your card Balance. You can reload your bitcoin card from these methods.

vin.bit.786 said...

Blockchain is the network who operates your digital coin i.e. cryptocurrency or bitcoin. So all your money that is present in blockchain wallet or account is considered as BLOCKCHAIN funds. Visit the given link for more.
Withdraw bitcoin from blockchain
Thank You

trainingnoida said...

Subscribe to your own. It is important for us. We are simply beautiful and beautiful.

Some Amazing Facts about Ethical Training

Top reasons to learn German language Online