Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

apply/map a different function per row in a data frame with varying parameters

Tags:

r

dplyr

purrr

I have a simple problem for you purrr-experts out there that has eluded my best googling efforts for some time. First, let's take a look at the nested-list data structure I'm trying to work with.

Load packages

#R version 3.4.1    
library(purrr) # version 0.2.4
library(dplyr) # version 0.7.4

Define functions

f1 <- function(a, b, c) {a + b^c}
f2 <- function(x) {x * 2}
f3 <- function(y, z) {y * z}

Define parameter sets

These are to be passed through to each of f1, f2, and f3:

p1 <- data_frame(a = c(2, 4, 5, 7, 8),
                 b = c(1, 1, 2, 2, 2),
                 c = c(.5, 5, 1, 2, 3))
p2 <- data_frame(x = c(1, 4))
p3 <- data_frame(y = c(2, 2, 2, 3),
                 z = c(5, 4, 3, 2))

Put them together into a nested dataframe

I am trying to keep my data wieldy, in a nice, neat rectangle. The "id" variable is the function name itself (in my real data, there are hundreds of these):

df <- data_frame(fun_id = c('f1', 'f2', 'f3'), 
                 params = list(p1, p2, p3), 
                 funs = list(f1, f2, f3))

Checking the structure shows us the list-columns for params and funs:

print(df)

# A tibble: 3 x 3
  fun_id           params   funs
   <chr>           <list> <list>
1     f1 <tibble [5 x 3]>  <fun>
2     f2 <tibble [2 x 1]>  <fun>
3     f3 <tibble [4 x 2]>  <fun>

My question

Using purrr functions and perhaps dplyr::mutate, how do I get a new list-column in df called results in which each element is a list containing the outputs of executing the functions in funs with parameters taken from params, in a rowwise fashion?

I can get pmap to do what I want for a simple case:

> pmap(.l = p1, .f = f1)

[[1]]
[1] 3

[[2]]
[1] 5

[[3]]
[1] 7

[[4]]
[1] 11

[[5]]
[1] 16

But I really want to do this inside a data frame to keep everything straight. The following gets me to the right structure (a data frame with a list-column for the results), but only for one row and it's not generalized:

> df %>% 
  slice(1) %>% 
  mutate(results = list(pmap(.l = params[[1]], .f = funs[[1]])))

# A tibble: 1 x 4
  fun_id           params   funs    results
   <chr>           <list> <list>     <list>
1     f1 <tibble [5 x 3]>  <fun> <list [5]>

Thanks in advance for the help rounding out my problem!

P.S. I have looked at the following resources, but haven't found an answer yet:

purrr::pmap with dplyr::mutate

Using purrr::pmap within mutate to create list-column

http://statwonk.com/purrr.html

https://github.com/rstudio/cheatsheets/raw/master/purrr.pdf

https://jennybc.github.io/purrr-tutorial/index.html

like image 211
vergilcw Avatar asked Jul 13 '18 17:07

vergilcw


2 Answers

There is a convenience function in purrr for exactly this situation; applying a list of functions to a corresponding list of parameters! It's called invoke_map and can be used with mutate as below. I think the main advantage over map2(~pmap()) is that if there are additional parameters to supply to any of the functions not included in params you can add them as named arguments in ... instead of needing to modify params.

library(tidyverse)
f1 <- function(a, b, c) {a + b^c}
f2 <- function(x) {x * 2}
f3 <- function(y, z) {y * z}
p1 <- data_frame(
  a = c(2, 4, 5, 7, 8),
  b = c(1, 1, 2, 2, 2),
  c = c(.5, 5, 1, 2, 3)
)
p2 <- data_frame(x = c(1, 4))
p3 <- data_frame(
  y = c(2, 2, 2, 3),
  z = c(5, 4, 3, 2)
)
df <- data_frame(
  fun_id = c("f1", "f2", "f3"),
  params = list(p1, p2, p3),
  funs = list(f1, f2, f3)
)

df2 <- df %>%
  mutate(results = invoke_map(.f = funs, .x = params))
df2
#> # A tibble: 3 x 4
#>   fun_id params           funs   results  
#>   <chr>  <list>           <list> <list>   
#> 1 f1     <tibble [5 x 3]> <fn>   <dbl [5]>
#> 2 f2     <tibble [2 x 1]> <fn>   <dbl [2]>
#> 3 f3     <tibble [4 x 2]> <fn>   <dbl [4]>
df2$results
#> [[1]]
#> [1]  3  5  7 11 16
#> 
#> [[2]]
#> [1] 2 8
#> 
#> [[3]]
#> [1] 10  8  6  6

Created on 2018-07-13 by the reprex package (v0.2.0).

like image 98
Calum You Avatar answered Nov 14 '22 20:11

Calum You


We can use map2 and apply the pmap function for each row.

df2 <- df %>%
  mutate(result = map2(params, funs, ~pmap(.l = .x, .f = .y)))
df2
# # A tibble: 3 x 4
#   fun_id params           funs   result    
#   <chr>  <list>           <list> <list>    
# 1 f1     <tibble [5 x 3]> <fn>   <list [5]>
# 2 f2     <tibble [2 x 1]> <fn>   <list [2]>
# 3 f3     <tibble [4 x 2]> <fn>   <list [4]>
like image 43
www Avatar answered Nov 14 '22 21:11

www