Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a matrix of residual plots using purrr and ggplot

Tags:

r

ggplot2

purrr

Suppose I have the following dataframe:

library(tidyverse)
fit <- lm(speed ~ dist, data = cars)
select(broom::augment(fit), .fitted:.std.resid) -> dt
names(dt) <- substring(names(dt), 2)

I would like to create a grid of residuals plots using purrr. For example, I have the formulas for 2 diagnostic plots so far:

    residual <- function(model) {ggplot(model, aes(fitted, resid)) +
                                  geom_point() +
                                  geom_hline(yintercept = 0) +
                                  geom_smooth(se = FALSE)}

stdResidual <- function(model) {ggplot(model, aes(fitted, std.resid)) +
                                    geom_point() +
                                    geom_hline(yintercept = 0) +
                                    geom_smooth(se = FALSE)}

And I am storing the formulas in a list that I plan to run against the fortified dataset dt.

formulas <- tibble(charts = list(residual, stdResidual))
# A tibble: 2 x 1
  charts
  <list>
1  <fun>
2  <fun>

Now I need to pass dt to each element of the column chart in formulas. I am actually also trying to combine both using gridExtra, but for now I would be satisfied if I could at least render both of them. I think I should run something like

pwalk(list(dt, formulas), ???)

But I have no idea what function I should use in ??? to render the plots.

like image 682
Dambo Avatar asked Sep 12 '17 04:09

Dambo


Video Answer


1 Answers

Set up functions to plot each one, just like you did above:

diagplot_resid <- function(df) {
  ggplot(df, aes(.fitted, .resid)) +
    geom_hline(yintercept = 0) +
    geom_point() +
    geom_smooth(se = F) +
    labs(x = "Fitted", y = "Residuals")
}

diagplot_stdres <- function(df) {
  ggplot(df, aes(.fitted, sqrt(.std.resid))) +
    geom_hline(yintercept = 0) +
    geom_point() +
    geom_smooth(se = F) +
    labs(x = "Fitted", y = expression(sqrt("Standardized residuals")))
}

diagplot_qq <- function(df) {
  ggplot(df, aes(sample = .std.resid)) +
    geom_abline(slope = 1, intercept = 0, color = "black") +
    stat_qq() +
    labs(x = "Theoretical quantiles", y = "Standardized residuals")
}

Then call each in a list, with the dataframe as your second argument. Here you're invokeing a list of functions, and parallel-ly applying them to a list of function arguments. Since there's only one element to the second list, invoke_map loops over them.

fit <- lm(mpg~wt, mtcars)
df_aug <- augment(fit)

purrr::invoke_map(.f = list(diagplot_resid, diagplot_stdres, diagplot_qq), 
                  .x = list(list(df_aug))) %>% 
  gridExtra::grid.arrange(grobs = ., ncol = 2, 
                          top = paste("Diagnostic plots for",
                                      as.expression(fit$call)))

enter image description here

like image 161
Brian Avatar answered Oct 03 '22 08:10

Brian