Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting model coefficients from a nested list (list-columns)

I'm following the Many Models example from the R for Data Science book to generate individual multiple linear regression models. As illustrated in the reproducible example below, my data frame consists of three columns: id, val1, val2, val3,. I am able to fit linear models using the map function as detailed in Hadley's book. However, I've been struggling with extracting the coefficients from each of the models and adding these values as columns in the dataframes in my.list object. The way the model coefficients are currently stored, makes it difficult/cumbersome to call in other parts of my code.

The best I've come up with so far is to make a list the length of my.list and extract coefficients by iterating over each data frame in my.list: Name1, Name2, Name3. This means that now I have another list in my global environment and the coef.list no longer contains factors Name1, Name2, Name3 from my.list; these have now been replaced with [[1]], [[2]], [[3]].

Can anyone suggest a "cleaner" way of extracting model coefficients when working with multiple models? My preferred output would simply be to create a column for each coefficient: intercept, val1, val2. These columns would go in the existing data frames Name1, Name2, Name3 within my.list so that I can use mutate directly on the data frames:

# reproducible example
set.seed(1363)
d1 <- data.frame(id=c("Name1", "Name2", "Name3"),
             val1=c(rnorm(n=15, mean=5)), 
             val2=c(rnorm(n=15, mean=3)), 
             val3=c(rnorm(n=15, mean=8)))

# linear model function
lm.fun <- function(df){
  lm(val3 ~ val1+val2, data = df)
}

# map lm function
d1 <- d1 %>%
  group_by(id) %>%
  nest() %>%
  mutate(model = map(data, lm.fun)) %>%
  unnest(data, .drop = FALSE)

#split data frame by 'id' and convert into list
my.list <- split(d1, d1$id)

# make list of coefficients
coef.list <- list(length(my.list))
for (i in seq_along(my.list)) {
  coef.list[[i]] <- my.list[[i]][["model"]][[1]][["coefficients"]]
}

>head(coef.list, n=1)
[[1]]
(Intercept)        val1        val2 
9.03278337 -0.07096932  0.02119088 

DESIRED OUTPUT

my.list$Name1
id    val1  val2  val3  intc  coef1  coef2
Name1  1      2    3    9.03  -.070  .021   
Name1  3      1    5    9.03  -.070  .021
Name1  2      6    8    9.03  -.070  .021  
like image 819
On_an_island Avatar asked Jul 04 '18 23:07

On_an_island


3 Answers

Based on the description, if we need to create some columns with the coefficients, one option is using do with group_by. After grouping by 'id', extract the coefficients as a list within do and rename the columns (if needed)

library(tidyverse)
d1 %>% 
  group_by(id) %>% 
  do(data.frame(., as.list(coef(lm(val3 ~ val1 + val2, data = .))))) %>%
  rename_at(5:7, ~c("intc", "coef1", "coef2"))
# A tibble: 15 x 7
# Groups:   id [3]
#   id     val1  val2  val3  intc   coef1   coef2
#   <fct> <dbl> <dbl> <dbl> <dbl>   <dbl>   <dbl>
# 1 Name1  6.76  2.64  9.85  9.03 -0.0710  0.0212
# 2 Name1  6.78  1.52  6.94  9.03 -0.0710  0.0212
# 3 Name1  4.14  4.31  8.30  9.03 -0.0710  0.0212
# 4 Name1  5.55  2.16  9.97  9.03 -0.0710  0.0212
# 5 Name1  6.18  3.64  8.32  9.03 -0.0710  0.0212
# 6 Name2  6.08  1.12  9.96  7.33  0.468  -0.488 
# 7 Name2  3.54  4.71  6.43  7.33  0.468  -0.488 
# 8 Name2  5.66  4.54  8.69  7.33  0.468  -0.488 
# 9 Name2  6.88  4.15  7.79  7.33  0.468  -0.488 
#10 Name2  4.89  1.27  8.72  7.33  0.468  -0.488 
#11 Name3  6.41  4.38  6.22 20.1  -2.15    0.118 
#12 Name3  5.06  3.28  9.42 20.1  -2.15    0.118 
#13 Name3  6.25  3.16  8.15 20.1  -2.15    0.118 
#14 Name3  6.03  3.63  7.78 20.1  -2.15    0.118 
#15 Name3  6.51  1.46  5.90 20.1  -2.15    0.118 

Or we can use broom package functions. Functions like tidy, glance extract columns of interest (and more). After nesting the grouped dataset, build the model, extract the components with tidy, select the columns of interest and unnest

library(broom)
d1 %>% 
   group_by(id) %>%
   nest %>%
   mutate(model1 = map(data, ~ 
                         lm(val3 ~ val1 + val2, data = .) %>%
                              tidy %>%
                              dplyr::select(term, estimate) %>% 
                              spread(term, estimate) %>% 
                              rename_all(~ c("intc", paste0("coef", 1:2))))) %>%  
   unnest(model1) %>%
   unnest(data)
# A tibble: 15 x 7
#   id     intc   coef1   coef2  val1  val2  val3
#   <fct> <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl>
# 1 Name1  9.03 -0.0710  0.0212  6.76  2.64  9.85
# 2 Name1  9.03 -0.0710  0.0212  6.78  1.52  6.94
# 3 Name1  9.03 -0.0710  0.0212  4.14  4.31  8.30
# 4 Name1  9.03 -0.0710  0.0212  5.55  2.16  9.97
# 5 Name1  9.03 -0.0710  0.0212  6.18  3.64  8.32
# 6 Name2  7.33  0.468  -0.488   6.08  1.12  9.96
# 7 Name2  7.33  0.468  -0.488   3.54  4.71  6.43
# 8 Name2  7.33  0.468  -0.488   5.66  4.54  8.69
# 9 Name2  7.33  0.468  -0.488   6.88  4.15  7.79
#10 Name2  7.33  0.468  -0.488   4.89  1.27  8.72
#11 Name3 20.1  -2.15    0.118   6.41  4.38  6.22
#12 Name3 20.1  -2.15    0.118   5.06  3.28  9.42
#13 Name3 20.1  -2.15    0.118   6.25  3.16  8.15
#14 Name3 20.1  -2.15    0.118   6.03  3.63  7.78
#15 Name3 20.1  -2.15    0.118   6.51  1.46  5.90

Or without using broom

d1 %>%
    group_by(id) %>%
    nest %>%
    mutate(model = map(data, ~ lm(val3 ~ val1 + val2, data = .) %>%
                         coef %>% 
                         as.list %>%
                         as_tibble)) %>% 
    unnest(model) %>%
    unnest(data)

If we only need a summarised output for each 'id'

d1 %>% 
  group_by(id) %>% 
  nest %>%
  mutate(model1 = map(data, ~ 
                      lm(val3 ~ val1 + val2, data = .) %>%
                         tidy %>%
                         dplyr::select(term, estimate) %>%
                         spread(term, estimate) %>% 
                         rename_all(~ c("intc", paste0("coef", 1:2))))) %>%  
  dplyr::select(-data) %>% 
  unnest

Or with data.table, we can do this more concisely, after grouping by 'id' and assigning (:=) the new columns by extracting the coef of the model as a list

library(data.table)
setDT(d1)[, c('intc', 'coef1', 'coef2') := 
           as.list(coef(lm(val3 ~ val1 + val2))), id]
d1[order(id)]
#       id     val1     val2     val3      intc       coef1       coef2
# 1: Name1 6.755964 2.642874 9.849828  9.032783 -0.07096932  0.02119088
# 2: Name1 6.776666 1.522431 6.937053  9.032783 -0.07096932  0.02119088
# 3: Name1 4.141883 4.307537 8.301940  9.032783 -0.07096932  0.02119088
# 4: Name1 5.551850 2.163882 9.971588  9.032783 -0.07096932  0.02119088
# 5: Name1 6.179506 3.635832 8.319042  9.032783 -0.07096932  0.02119088
# 6: Name2 6.083243 1.116293 9.960934  7.325156  0.46840770 -0.48806159
# 7: Name2 3.536476 4.708967 6.427627  7.325156  0.46840770 -0.48806159
# 8: Name2 5.663909 4.541081 8.691523  7.325156  0.46840770 -0.48806159
# 9: Name2 6.883746 4.150780 7.791050  7.325156  0.46840770 -0.48806159
#10: Name2 4.890291 1.269559 8.723792  7.325156  0.46840770 -0.48806159
#11: Name3 6.414915 4.383609 6.220188 20.106581 -2.14601530  0.11770877
#12: Name3 5.059774 3.276510 9.421862 20.106581 -2.14601530  0.11770877
#13: Name3 6.251416 3.157157 8.147720 20.106581 -2.14601530  0.11770877
#14: Name3 6.028100 3.630858 7.783118 20.106581 -2.14601530  0.11770877
#15: Name3 6.505153 1.460564 5.895564 20.106581 -2.14601530  0.11770877

Or do a join if we don't want to update the initial data

setDT(d1)[, as.list(coef(lm(val3 ~ val1 + val2))), id][d1, on = .(id)]

NOTE: 'd1' is the initial dataset

like image 85
akrun Avatar answered Sep 19 '22 00:09

akrun


Using sapply, given the data d1 (containing the model column):

coeff <- sapply(d1$model, function(x) return(coefficients(x)))

library(dplyr)
bind_cols(d1, data.frame(t(coeff))) %>% 
     rename_at(6:8, ~ c("intc",  "coef1",  "coef2")) %>% 
          distinct(id, .keep_all = TRUE)
like image 33
Mankind_008 Avatar answered Sep 21 '22 00:09

Mankind_008


library(tidyverse)

d1%>%
   group_by(id)%>%
   mutate(s=toString(coef(lm(val3~val1+val2))))%>%
   separate(s,c("intercept","coef1","coef2"),sep=",",convert = T)%>%
   arrange(id)
# A tibble: 15 x 7
# Groups:   id [3]
   id     val1  val2  val3 intercept   coef1   coef2
   <fct> <dbl> <dbl> <dbl>     <dbl>   <dbl>   <dbl>
 1 Name1  6.76  2.64  9.85      9.03 -0.0710  0.0212
 2 Name1  6.78  1.52  6.94      9.03 -0.0710  0.0212
 3 Name1  4.14  4.31  8.30      9.03 -0.0710  0.0212
 4 Name1  5.55  2.16  9.97      9.03 -0.0710  0.0212
 5 Name1  6.18  3.64  8.32      9.03 -0.0710  0.0212
 6 Name2  6.08  1.12  9.96      7.33  0.468  -0.488 
 7 Name2  3.54  4.71  6.43      7.33  0.468  -0.488 
 8 Name2  5.66  4.54  8.69      7.33  0.468  -0.488 
 9 Name2  6.88  4.15  7.79      7.33  0.468  -0.488 
10 Name2  4.89  1.27  8.72      7.33  0.468  -0.488 
11 Name3  6.41  4.38  6.22     20.1  -2.15    0.118 
12 Name3  5.06  3.28  9.42     20.1  -2.15    0.118 
13 Name3  6.25  3.16  8.15     20.1  -2.15    0.118 
14 Name3  6.03  3.63  7.78     20.1  -2.15    0.118 
15 Name3  6.51  1.46  5.90     20.1  -2.15    0.118 
like image 45
KU99 Avatar answered Sep 22 '22 00:09

KU99