Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add Column of Predicted Values to Data Frame with dplyr

Tags:

r

dplyr

I have a data frame with a column of models and I am trying to add a column of predicted values to it. A minimal example is :

exampleTable <- data.frame(x = c(1:5, 1:5),
                           y = c((1:5) + rnorm(5), 2*(5:1)),
                           groups = rep(LETTERS[1:2], each = 5))
                           
models <- exampleTable %>% group_by(groups) %>% do(model = lm(y ~ x, data = .))
exampleTable <- left_join(tbl_df(exampleTable), models)

estimates <- exampleTable %>% rowwise() %>% do(Est = predict(.$model, newdata = .["x"]))

How can I add a column of numeric predictions to exampleTable? I tried using mutate to directly add the column to the table without success.

exampleTable <- exampleTable %>% rowwise() %>% mutate(data.frame(Pred = predict(.$model, newdata = .["x"])))

Error: no applicable method for 'predict' applied to an object of class "list"

Now I use bind_cols to add the estimates to exampleTable but I am looking for a better solution.

estimates <- exampleTable %>% rowwise() %>% do(data.frame(Pred = predict(.$model, newdata = .["x"])))
exampleTable <- bind_cols(exampleTable, estimates)

How can it be done in a single step?

like image 569
Dario Avatar asked Sep 25 '15 02:09

Dario


2 Answers

Using modelr, there is an elegant solution using the tidyverse.

The inputs

library(dplyr)
library(purrr)
library(tidyr)

# generate the inputs like in the question
example_table <- data.frame(x = c(1:5, 1:5),
                            y = c((1:5) + rnorm(5), 2*(5:1)),
                            groups = rep(LETTERS[1:2], each = 5))

models <- example_table %>% 
  group_by(groups) %>% 
  do(model = lm(y ~ x, data = .)) %>%
  ungroup()
example_table <- left_join(tbl_df(example_table ), models, by = "groups")

The solution

# generate the extra column
example_table %>%
  group_by(groups) %>%
  do(modelr::add_predictions(., first(.$model)))

The explanation

add_predictions adds a new column to a data frame using a given model. Unfortunately it only takes one model as an argument. Meet do. Using do, we can run add_prediction individually over each group.

. represents the grouped data frame, .$model the model column and first() takes the first model of each group.

Simplified

With only one model, add_predictions works very well.

# take one of the models
model <- example_table$model[[6]]

# generate the extra column
example_table %>%
  modelr::add_predictions(model)

Recipes

Nowadays, the tidyverse is shifting from the modelr package to recipes so that might be the new way to go once this package matures.

like image 82
takje Avatar answered Oct 06 '22 19:10

takje


Using the tidyverse:

library(dplyr)
library(purrr)
library(tidyr)
library(broom)

exampleTable <- data.frame(
  x = c(1:5, 1:5),
  y = c((1:5) + rnorm(5), 2*(5:1)),
  groups = rep(LETTERS[1:2], each = 5)
)

exampleTable %>% 
  group_by(groups) %>%
  nest() %>% 
  mutate(model = data %>% map(~lm(y ~ x, data = .))) %>% 
  mutate(Pred = map2(model, data, predict)) %>% 
  unnest(Pred, data)

# A tibble: 10 × 4
   groups      Pred     x          y
   <fctr>     <dbl> <int>      <dbl>
1       A  1.284185     1  0.9305908
2       A  1.909262     2  1.9598293
3       A  2.534339     3  3.2812002
4       A  3.159415     4  2.9283637
5       A  3.784492     5  3.5717085
6       B 10.000000     1 10.0000000
7       B  8.000000     2  8.0000000
8       B  6.000000     3  6.0000000
9       B  4.000000     4  4.0000000
10      B  2.000000     5  2.0000000
like image 21
Italo Cegatta Avatar answered Oct 06 '22 18:10

Italo Cegatta