Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract model summaries and store them as a new column

I'm new to the purrr paradigm and am struggling with it.

Following a few sources I have managed to get so far as to nest a data frame, run a linear model on the nested data, extract some coefficients from each lm, and generate a summary for each lm. The last thing I want to do is extract the "r.squared" from the summary (which I would have thought would be the simplest part of what I'm trying to achieve), but for whatever reason I can't get the syntax right.

Here's a MWE of what I have that works:

library(purrr)
library(dplyr)
library(tidyr)

mtcars %>%
  nest(-cyl) %>%
  mutate(fit = map(data, ~lm(mpg ~ wt, data = .)),
         sum = map(fit, ~summary))

and here's my attempt to extract the r.squared which fails:

mtcars %>%
  nest(-cyl) %>%
  mutate(fit = map(data, ~lm(mpg ~ wt, data = .)),
         sum = map(fit, ~summary),
         rsq = map_dbl(sum, "r.squared"))
Error in eval(substitute(expr), envir, enclos) : 
  `x` must be a vector (not a closure)

This is superficially similar to the example given on the RStudio site:

mtcars %>%
  split(.$cyl) %>%
  map(~ lm(mpg ~ wt, data = .x)) %>%
  map(summary) %>%
  map_dbl("r.squared")

This works however I would like the r.squared values to sit in a new column (hence the mutate statement) and I'd like to understand why my code isn't working instead of working-around the problem.

EDIT:

Here's a working solution that I came to using the solutions below:

mtcars %>%
      nest(-cyl) %>% 
      mutate(fit = map(data, ~lm(mpg ~ wt, data = .)),
             summary = map(fit, glance),
             r_sq = map_dbl(summary, "r.squared"))

EDIT 2:

So, it actually turns out that the bug is from the inclusion of the tilde key in the summary = map(fit, ~summary) line. My guess is that the makes the object a function which is nest and not the object returned by the summary itself. Would love an authoritative answer on this if someone wants to chime in.

To be clear, this version of the original code works fine:

mtcars %>%
  nest(-cyl) %>%
  mutate(fit = map(data, ~lm(mpg ~ wt, data = .)),
         summary = map(fit, summary),
         r_sq = map_dbl(summary, "r.squared"))
like image 592
niklz Avatar asked Dec 02 '16 10:12

niklz


1 Answers

To fit in your current pipe, you'd want to use unnest along with map and glance from the broom package.

library(tidyr)
library(dplyr)
library(broom)

mtcars %>%
  nest(-cyl) %>%
  mutate(fit = map(data, ~lm(mpg ~ wt, data = .))) %>% 
  unnest(map(fit, glance))

You'll get more than just the r-squared, and from there you can use select to drop what you don't need.

If you want to keep the model summaries nested in list-columns:

mtcars %>%
  nest(-cyl) %>% 
  mutate(fit = map(data, ~lm(mpg ~ wt, data = .)),
         summary = map(fit, glance)) 

If you want to just extract a single value from a nested frame you just need to use map to the actual value (and not [[ or extract2 as I originally suggested, many thanks for finding that out).

mtcars %>%
  nest(-cyl) %>% 
  mutate(fit = map(data, ~lm(mpg ~ wt, data = .)),
         summary = map(fit, glance),
         r_sq = map_dbl(summary, "r.squared"))
like image 104
Jake Kaupp Avatar answered Oct 05 '22 18:10

Jake Kaupp