Fit a different model for each row of a list-columns data frame

Tags:

What is the best way to fit different model formulae that vary by the row of a data frame with the list-columns data structure in tidyverse?

In R for Data Science, Hadley presents a terrific example of how to use the list-columns data structure and fit many models easily (http://r4ds.had.co.nz/many-models.html#gapminder). I am trying to find a way to fit many models with slightly different formulae. In the below example adapted from his original example, what is the best way to fit a different model for each continent?

library(gapminder)
library(dplyr)
library(tidyr)
library(purrr)
library(broom)

by_continent <- gapminder %>% 
  group_by(continent) %>% 
  nest()

by_continent <- by_continent %>% 
  mutate(model = map(data, ~lm(lifeExp ~ year, data = .)))

by_continent %>% 
  mutate(glance=map(model, glance)) %>% 
  unnest(glance, .drop=T)

## A tibble: 5 × 12
#  continent r.squared adj.r.squared     sigma statistic      p.value    df
#     <fctr>     <dbl>         <dbl>     <dbl>     <dbl>        <dbl> <int>
#1      Asia 0.4356350     0.4342026 8.9244419  304.1298 6.922751e-51     2
#2    Europe 0.4984659     0.4970649 3.8530964  355.8099 1.344184e-55     2
#3    Africa 0.2987543     0.2976269 7.6685811  264.9929 6.780085e-50     2
#4  Americas 0.4626467     0.4608435 6.8618439  256.5699 4.354220e-42     2
#5   Oceania 0.9540678     0.9519800 0.8317499  456.9671 3.299327e-16     2
## ... with 5 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>,
##   deviance <dbl>, df.residual <int>

I know I can do it by iterating through by_continent (not efficient as it estimates each model for every continent:

formulae <- list(
  Asia=~lm(lifeExp ~ year, data = .),
  Europe=~lm(lifeExp ~ year + pop, data = .),
  Africa=~lm(lifeExp ~ year + gdpPercap, data = .),
  Americas=~lm(lifeExp ~ year - 1, data = .),
  Oceania=~lm(lifeExp ~ year + pop + gdpPercap, data = .)
)

for (i in 1:nrow(by_continent)) {
  by_continent$model[[i]] <- map(by_continent$data, formulae[[i]])[[i]]
}

by_continent %>% 
  mutate(glance=map(model, glance)) %>% 
  unnest(glance, .drop=T)

## A tibble: 5 × 12
#  continent r.squared adj.r.squared     sigma  statistic       p.value    df
#     <fctr>     <dbl>         <dbl>     <dbl>      <dbl>         <dbl> <int>
#1      Asia 0.4356350     0.4342026 8.9244419   304.1298  6.922751e-51     2
#2    Europe 0.4984677     0.4956580 3.8584819   177.4093  3.186760e-54     3
#3    Africa 0.4160797     0.4141991 7.0033542   221.2506  2.836552e-73     3
#4  Americas 0.9812082     0.9811453 8.9703814 15612.1901 4.227928e-260     1
#5   Oceania 0.9733268     0.9693258 0.6647653   243.2719  6.662577e-16     4
## ... with 5 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>,
##   deviance <dbl>, df.residual <int>

But is it possible to do this without following back to loop in base R (and avoiding fitting models I don't need)?

What I tried is something like this:

by_continent <- by_continent %>% 
left_join(tibble::enframe(formulae, name="continent", value="formula"))

by_continent %>% 
   mutate(model=map2(data, formula, est_model))

But I don't seem to be able to come up with an est_model function that works. I tried this function (h/t: https://gist.github.com/multidis/8138757) that doesn't work:

  est_model <- function(data, formula, ...) {
  mc <- match.call()
  m <- match(c("formula","data"), names(mc), 0L)
  mf <- mc[c(1L, m)]
  mf[[1L]] <- as.name("model.frame")
  mf <- eval(mf, parent.frame())
  data.st <- data.frame(mf)

  return(data.st)
}

(Admittedly, this is a contrived example. My actual case is that I have substantial observations missing key independent variables in my data, so I want to fit one model with all variables on complete observations and another with only a subset of the variables on the rest observations.)

UPDATE

I came up with an est_model function that works (though probably not efficient):

est_model <- function(data, formula, ...) {
  map(list(data), formula, ...)[[1]]
}

by_continent <- by_continent %>% 
   mutate(model=map2(data, formula, est_model))

by_continent %>% 
  mutate(glance=map(model, glance)) %>% 
  unnest(glance, .drop=T)

## A tibble: 5 × 12
#  continent r.squared adj.r.squared     sigma  statistic       p.value    df
#      <chr>     <dbl>         <dbl>     <dbl>      <dbl>         <dbl> <int>
#1      Asia 0.4356350     0.4342026 8.9244419   304.1298  6.922751e-51     2
#2    Europe 0.4984677     0.4956580 3.8584819   177.4093  3.186760e-54     3
#3    Africa 0.4160797     0.4141991 7.0033542   221.2506  2.836552e-73     3
#4  Americas 0.9812082     0.9811453 8.9703814 15612.1901 4.227928e-260     1
#5   Oceania 0.9733268     0.9693258 0.6647653   243.2719  6.662577e-16     4
## ... with 5 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>, deviance <dbl>,
##   df.residual <int>

799

asked Dec 30 '16 23:12

LmW.

2 Answers

I find it is easier to make a list of model formula. each model was only fit once for the corresponding continent. I add a new column formula to the nested data to make sure that the formula and the continent are in the same order in case they are not.

formulae <- c(
    Asia= lifeExp ~ year,
    Europe= lifeExp ~ year + pop,
    Africa= lifeExp ~ year + gdpPercap,
    Americas= lifeExp ~ year - 1,
    Oceania= lifeExp ~ year + pop + gdpPercap
)

df <- gapminder %>%
    group_by(continent) %>%
    nest() %>%
    mutate(formula = formulae[as.character(continent)]) %>%
    mutate(model = map2(formula, data, ~ lm(.x, .y))) %>%
    mutate(glance=map(model, glance)) %>%
    unnest(glance, .drop=T)

# # A tibble: 5 × 12
#   continent r.squared adj.r.squared     sigma  statistic       p.value    df      logLik        AIC        BIC
#      <fctr>     <dbl>         <dbl>     <dbl>      <dbl>         <dbl> <int>       <dbl>      <dbl>      <dbl>
# 1      Asia 0.4356350     0.4342026 8.9244419   304.1298  6.922751e-51     2 -1427.65947 2861.31893 2873.26317
# 2    Europe 0.4984677     0.4956580 3.8584819   177.4093  3.186760e-54     3  -995.41016 1998.82033 2014.36475
# 3    Africa 0.4160797     0.4141991 7.0033542   221.2506  2.836552e-73     3 -2098.46089 4204.92179 4222.66639
# 4  Americas 0.9812082     0.9811453 8.9703814 15612.1901 4.227928e-260     1 -1083.35918 2170.71836 2178.12593
# 5   Oceania 0.9733268     0.9693258 0.6647653   243.2719  6.662577e-16     4   -22.06696   54.13392   60.02419
# # ... with 2 more variables: deviance <dbl>, df.residual <int>

answered Sep 21 '22 10:09

mt1022

I found purrr::modify_depth() that does what I want to do with est_model() in the original question. This is the solution I am now happy with:

library(gapminder)
library(tidyverse)
library(purrr)
library(broom)

fmlas <- tibble::tribble(
  ~continent, ~formula,
  "Asia", ~lm(lifeExp ~ year, data = .),
  "Europe", ~lm(lifeExp ~ year + pop, data = .),
  "Africa", ~lm(lifeExp ~ year + gdpPercap, data = .),
  "Americas", ~lm(lifeExp ~ year - 1, data = .),
  "Oceania", ~lm(lifeExp ~ year + pop + gdpPercap, data = .)
)

by_continent <- gapminder %>% 
  nest(-continent) %>%
  left_join(fmlas) %>%
  mutate(model=map2(data, formula, ~modify_depth(.x, 0, .y)))

by_continent %>% 
  mutate(glance=map(model, glance)) %>% 
  unnest(glance, .drop=T)

answered Sep 24 '22 10:09

LmW.

Related questions
                            
                                Running the Angular compiler-cli (ngc); Cannot find module '@angular/core'
                            
                                perl6 What is the best way to match any of a group of words?
                            
                                Center one element along with multiple siblings
                            
                                Xamarin No installed provisioning profiles match the installed iOS signing identities
                            
                                In Python, why do separate dictionary string values pass "in" equality checks? ( string Interning Experiment )
                            
                                How can I figure out why Linenoise is failing to install?
                            
                                JetEntityFramework vs EF 6?
                            
                                Loading Pictures from Firebase Storage to React
                            
                                Handling DB failed to connect in laravel?
                            
                                how to set scope in template?
                            
                                Clear content in xml brackets in all files in directory tree on Windows using Strawberry Perl and twig
                            
                                Is there a way to build vue.js components without using the npm server?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fit a different model for each row of a list-columns data frame

Tags:

LmW.

People also ask

2 Answers

mt1022

LmW.

Recent Activity

Donate For Us