OK, this is a pretty basic question that I am having a hard time finding an answer. I have functions that return several results that are returned as a list. I want to store that output list in a dataframe. The data frame also has the variables that are used in the function. For example:
library(dplyr)
### function
testFunc <- function(a){
a = a
b = a+1
c = list(out1=a, out2=b)
return(c)
}
### data
dat <- data.frame(x=1:5)
### dplyr processing
datProcessed <- dat %>%
mutate(calcd = testFunc(x))
### Fails, `calcd` must be size 10 or 1, not 2.
However, if the output is a single item, of course it works:
datProcessed <- dat %>%
mutate(calcd = testFunc(x)$out2)
How do I store a list output in the dataframe column of lists using a dplyr pipe?
Here are some options, depending wholly on your expected output and what you're going to do with it next.
(BTW: I'm using tibble(dat) instead of dat only to differentiate between vector-columns and list-columns, your production use does not need tibble(..).)
If you want both vectors returned from testFunc() as individual columns in dat, then we can just do
tibble(dat) |>
mutate(as.data.frame(testFunc(x)))
# # A tibble: 5 × 3
# x out1 out2
# <int> <int> <dbl>
# 1 1 1 2
# 2 2 2 3
# 3 3 3 4
# 4 4 4 5
# 5 5 5 6
This works because mutate(.) (and other similar verb-functions in dplyr) appends columns if the value of the unnamed argument is a frame itself (it does not work with a named-list, though the differences between the two are very minor).
If you want each of the pairs of the return values stored in a list-column per-row in dat, then we can use purrr::transpose:
out <- dat |>
mutate(calcd = purrr::transpose(testFunc(x)))
out
# x calcd
# 1 1 1, 2
# 2 2 2, 3
# 3 3 3, 4
# 4 4 4, 5
# 5 5 5, 6
tibble(out)
# # A tibble: 5 × 2
# x calcd
# <int> <list>
# 1 1 <named list [2]>
# 2 2 <named list [2]>
# 3 3 <named list [2]>
# 4 4 <named list [2]>
# 5 5 <named list [2]>
out$calcd[[1]]
# $out1
# [1] 1
# $out2
# [1] 2
In this second form, each element in $calcd is a named list with one value each (based on how your testFunc(.) worked).
Both methods assume that the return from testFunc(.) is a named list of vectors where each vector is the same length as the number of rows.
If you aren't familiar with what purrr::transpose does, compare the change:
str(testFunc(dat$x))
# List of 2
# $ out1: int [1:5] 1 2 3 4 5
# $ out2: num [1:5] 2 3 4 5 6
str(purrr::transpose(testFunc(dat$x)))
# List of 5
# $ :List of 2
# ..$ out1: int 1
# ..$ out2: num 2
# $ :List of 2
# ..$ out1: int 2
# ..$ out2: num 3
# $ :List of 2
# ..$ out1: int 3
# ..$ out2: num 4
# $ :List of 2
# ..$ out1: int 4
# ..$ out2: num 5
# $ :List of 2
# ..$ out1: int 5
# ..$ out2: num 6
You probably want to apply your function to each row individually, in which case you could do:
library(tidyverse)
dat %>%
mutate(calcd = apply(across(x), 1, testFunc))
This returns:
x calcd
1 1 1, 2
2 2 2, 3
3 3 3, 4
4 4 4, 5
5 5 5, 6
'data.frame': 5 obs. of 2 variables:
$ x : int 1 2 3 4 5
$ calcd:List of 5
..$ :List of 2
.. ..$ out1: Named int 1
.. .. ..- attr(*, "names")= chr "x"
.. ..$ out2: Named num 2
.. .. ..- attr(*, "names")= chr "x"
..$ :List of 2
.. ..$ out1: Named int 2
.. .. ..- attr(*, "names")= chr "x"
.. ..$ out2: Named num 3
.. .. ..- attr(*, "names")= chr "x"
..$ :List of 2
.. ..$ out1: Named int 3
.. .. ..- attr(*, "names")= chr "x"
.. ..$ out2: Named num 4
.. .. ..- attr(*, "names")= chr "x"
..$ :List of 2
.. ..$ out1: Named int 4
.. .. ..- attr(*, "names")= chr "x"
.. ..$ out2: Named num 5
.. .. ..- attr(*, "names")= chr "x"
..$ :List of 2
.. ..$ out1: Named int 5
.. .. ..- attr(*, "names")= chr "x"
.. ..$ out2: Named num 6
.. .. ..- attr(*, "names")= chr "x"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With