Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

purrr::map_df with nested data.frame

Tags:

r

purrr

tidyr

I'd like to iterate over a series of dataframes and apply the same function to them all.

I'm trying this using tidyr::nest and purrr::map_df. Here's a reprex of the sort of thing I'm trying to achieve.

data(iris)
library(purrr)
library(tidyr)

iris_df <- as.data.frame(iris)
my_var <- 2

my_fun <- function(df) {
  sum_df <- sum(df) + my_var
}

iris_df %>% group_by(Species) %>% nest() %>% map_df(.$data, my_fun)
# Error: Index 1 must have length 1

What am I doing wrong? Is there a different approach?

EDIT: To clarify my desired output. Aiming for new column containing output eg

|Species|Data|my_function_output|
|:------|:---|:-----------------|
|setosa |<tibble>|509.1         |
like image 695
mark Avatar asked Jun 25 '26 15:06

mark


1 Answers

The problem is that nest() gives you a data.frame with a column data which is a list of data.frames. You need to map or sapply over the data column of the nest() output, not the entire nest output. I use sapply, but you could also use map_dbl. If you use map you will end up with list output, and map_df will not work because it requires named input.

iris_df %>% 
  group_by(Species) %>% 
  nest() %>% 
  mutate(my_fun_out = sapply(data, my_fun))

# A tibble: 3 x 3
  Species    data              my_fun_out
  <fct>      <list>                 <dbl>
1 setosa     <tibble [50 x 4]>        509
2 versicolor <tibble [50 x 4]>        717
3 virginica  <tibble [50 x 4]>        859
like image 115
IceCreamToucan Avatar answered Jun 28 '26 05:06

IceCreamToucan