Mutating columns of a data frame based on a predicate function (dplyr::mutate_if)

Question

I would like to use dplyr's mutate_if() function to convert list-columns to data-frame-columns, but run into a puzzling error when I try to do so. I am using dplyr 0.5.0, purrr 0.2.2, R 3.3.0.

The basic setup looks like this: I have a data frame d, some of whose columns are lists:

d <- dplyr::data_frame(
  A = list(
    list(list(x = "a", y = 1), list(x = "b", y = 2)),
    list(list(x = "c", y = 3), list(x = "d", y = 4))
  ),
  B = LETTERS[1:2]
)

I would like to convert the column of lists (in this case, d$A) to a column of data frames using the following function:

tblfy <- function(x) {
  x %>%
    purrr::transpose() %>%
    purrr::simplify_all() %>%
    dplyr::as_data_frame()
}

That is, I would like the list-column d$A to be replaced by the list lapply(d$A, tblfy), which is

[[1]]
#  A tibble: 2 x 2
      x     y
  <chr> <dbl>
1     a     1
2     b     2

[[2]]
# A tibble: 2 x 2
      x     y
  <chr> <dbl>
1     c     3
2     d     4

Of course, in this simple case, I could just do a simple reassignment. The point, however, is that I would like to do this programmatically, ideally with dplyr, in a generally applicable way that could deal with any number of list-columns.

Here's where I stumble: When I try to convert the list-columns to data-frame-columns using the following application

d %>% dplyr::mutate_if(is.list, funs(tblfy))

I get an error message that I don't know how to interpret:

Error: Each variable must be named.
Problem variables: 1, 2

Why does mutate_if() fail? How can I properly apply it to get the desired result?

Remark

A commenter has pointed out that the function tblfy() should be vectorized. That is a reasonable suggestion. But — unless I have vectorized incorrectly — that does not seem to get at the root of the problem. Plugging in a vectorized version of tblfy(),

tblfy_vec <- Vectorize(tblfy)

into mutate_if() fails with the error

Error: wrong result size (4), expected 2 or 1

Update

After gaining some experience with purrr, I now find the following approach natural, if somewhat long-winded:

d %>%
  map_if(is.list, ~ map(., ~ map_df(., identity))) %>%
  as_data_frame()

This is more or less identical to @alistaire's solution, below, but uses map_if(), resp. map(), in place of mutate_if(), resp. Vectorize().

alistaire · Accepted Answer

The original tblfy function errors out for me (even when its elements are chained directly), so let's rebuild it a bit, adding vectorization as well, which lets us avoid an otherwise-necessary prior rowwise() call:

tblfy <- Vectorize(function(x){x %>% purrr::map_df(identity) %>% list()})

Now we can use mutate_if nicely:

d %>% mutate_if(purrr::is_list, tblfy)
## Source: local data frame [2 x 2]
## 
##                A     B
##           <list> <chr>
## 1 <tbl_df [2,2]>     A
## 2 <tbl_df [2,2]>     B

...and if we unnest to see what's there,

d %>% mutate_if(purrr::is_list, tblfy) %>% tidyr::unnest()
## Source: local data frame [4 x 3]
## 
##       B     x     y
##   <chr> <chr> <dbl>
## 1     A     a     1
## 2     A     b     2
## 3     B     c     3
## 4     B     d     4

A couple notes:

map_df(identity) seems to be more efficient at building a tibble than any of the alternative formulations. I know the identity call seems unnecessary, but most everything else breaks.
I'm not sure how widely useful tblfy will be, as it's somewhat dependent on the structure of the lists in the list column, which can vary enormously. If you have a lot with a similar structure, I suppose it's useful, though.
There may be a way to do this with pmap instead of Vectorize, but I can't get it to work with some cursory tries.

eddi · Answer

In-place conversion without any copying:

library(data.table)

for (col in d) if (is.list(col)) lapply(col, setDF)

d
#Source: local data frame [2 x 2]
#
#                A B
#1 <S3:data.frame> A
#2 <S3:data.frame> B

Mutating columns of a data frame based on a predicate function (dplyr::mutate_if)

Tags:

r

dplyr

purrr

egnha

2 Answers

alistaire

eddi

Recent Activity

Donate For Us

Mutating columns of a data frame based on a predicate function (dplyr::mutate_if)

Tags:

r

dplyr

purrr

egnha

2 Answers

alistaire

eddi

Related questions

Recent Activity

Donate For Us