I would like to use dplyr's mutate_if()
function to convert list-columns to data-frame-columns, but run into a puzzling error when I try to do so. I am using dplyr 0.5.0, purrr 0.2.2, R 3.3.0.
The basic setup looks like this: I have a data frame d
, some of whose columns are lists:
d <- dplyr::data_frame(
A = list(
list(list(x = "a", y = 1), list(x = "b", y = 2)),
list(list(x = "c", y = 3), list(x = "d", y = 4))
),
B = LETTERS[1:2]
)
I would like to convert the column of lists (in this case, d$A
) to a column of data frames using the following function:
tblfy <- function(x) {
x %>%
purrr::transpose() %>%
purrr::simplify_all() %>%
dplyr::as_data_frame()
}
That is, I would like the list-column d$A
to be replaced by the list lapply(d$A, tblfy)
, which is
[[1]]
# A tibble: 2 x 2
x y
<chr> <dbl>
1 a 1
2 b 2
[[2]]
# A tibble: 2 x 2
x y
<chr> <dbl>
1 c 3
2 d 4
Of course, in this simple case, I could just do a simple reassignment. The point, however, is that I would like to do this programmatically, ideally with dplyr, in a generally applicable way that could deal with any number of list-columns.
Here's where I stumble: When I try to convert the list-columns to data-frame-columns using the following application
d %>% dplyr::mutate_if(is.list, funs(tblfy))
I get an error message that I don't know how to interpret:
Error: Each variable must be named.
Problem variables: 1, 2
Why does mutate_if()
fail? How can I properly apply it to get the desired result?
Remark
A commenter has pointed out that the function tblfy()
should be vectorized. That is a reasonable suggestion. But — unless I have vectorized incorrectly — that does not seem to get at the root of the problem. Plugging in a vectorized version of tblfy()
,
tblfy_vec <- Vectorize(tblfy)
into mutate_if()
fails with the error
Error: wrong result size (4), expected 2 or 1
Update
After gaining some experience with purrr, I now find the following approach natural, if somewhat long-winded:
d %>%
map_if(is.list, ~ map(., ~ map_df(., identity))) %>%
as_data_frame()
This is more or less identical to @alistaire's solution, below, but uses map_if()
, resp. map()
, in place of mutate_if()
, resp. Vectorize()
.
The original tblfy
function errors out for me (even when its elements are chained directly), so let's rebuild it a bit, adding vectorization as well, which lets us avoid an otherwise-necessary prior rowwise()
call:
tblfy <- Vectorize(function(x){x %>% purrr::map_df(identity) %>% list()})
Now we can use mutate_if
nicely:
d %>% mutate_if(purrr::is_list, tblfy)
## Source: local data frame [2 x 2]
##
## A B
## <list> <chr>
## 1 <tbl_df [2,2]> A
## 2 <tbl_df [2,2]> B
...and if we unnest to see what's there,
d %>% mutate_if(purrr::is_list, tblfy) %>% tidyr::unnest()
## Source: local data frame [4 x 3]
##
## B x y
## <chr> <chr> <dbl>
## 1 A a 1
## 2 A b 2
## 3 B c 3
## 4 B d 4
A couple notes:
map_df(identity)
seems to be more efficient at building a tibble than any of the alternative formulations. I know the identity
call seems unnecessary, but most everything else breaks.tblfy
will be, as it's somewhat dependent on the structure of the lists in the list column, which can vary enormously. If you have a lot with a similar structure, I suppose it's useful, though.pmap
instead of Vectorize
, but I can't get it to work with some cursory tries.In-place conversion without any copying:
library(data.table)
for (col in d) if (is.list(col)) lapply(col, setDF)
d
#Source: local data frame [2 x 2]
#
# A B
#1 <S3:data.frame> A
#2 <S3:data.frame> B
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With