Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mutating columns of a data frame based on a predicate function (dplyr::mutate_if)

Tags:

r

dplyr

purrr

I would like to use dplyr's mutate_if() function to convert list-columns to data-frame-columns, but run into a puzzling error when I try to do so. I am using dplyr 0.5.0, purrr 0.2.2, R 3.3.0.

The basic setup looks like this: I have a data frame d, some of whose columns are lists:

d <- dplyr::data_frame(
  A = list(
    list(list(x = "a", y = 1), list(x = "b", y = 2)),
    list(list(x = "c", y = 3), list(x = "d", y = 4))
  ),
  B = LETTERS[1:2]
)

I would like to convert the column of lists (in this case, d$A) to a column of data frames using the following function:

tblfy <- function(x) {
  x %>%
    purrr::transpose() %>%
    purrr::simplify_all() %>%
    dplyr::as_data_frame()
}

That is, I would like the list-column d$A to be replaced by the list lapply(d$A, tblfy), which is

[[1]]
#  A tibble: 2 x 2
      x     y
  <chr> <dbl>
1     a     1
2     b     2

[[2]]
# A tibble: 2 x 2
      x     y
  <chr> <dbl>
1     c     3
2     d     4

Of course, in this simple case, I could just do a simple reassignment. The point, however, is that I would like to do this programmatically, ideally with dplyr, in a generally applicable way that could deal with any number of list-columns.

Here's where I stumble: When I try to convert the list-columns to data-frame-columns using the following application

d %>% dplyr::mutate_if(is.list, funs(tblfy))

I get an error message that I don't know how to interpret:

Error: Each variable must be named.
Problem variables: 1, 2

Why does mutate_if() fail? How can I properly apply it to get the desired result?

Remark

A commenter has pointed out that the function tblfy() should be vectorized. That is a reasonable suggestion. But — unless I have vectorized incorrectly — that does not seem to get at the root of the problem. Plugging in a vectorized version of tblfy(),

tblfy_vec <- Vectorize(tblfy)

into mutate_if() fails with the error

Error: wrong result size (4), expected 2 or 1

Update

After gaining some experience with purrr, I now find the following approach natural, if somewhat long-winded:

d %>%
  map_if(is.list, ~ map(., ~ map_df(., identity))) %>%
  as_data_frame()

This is more or less identical to @alistaire's solution, below, but uses map_if(), resp. map(), in place of mutate_if(), resp. Vectorize().

like image 426
egnha Avatar asked Jul 07 '16 18:07

egnha


2 Answers

The original tblfy function errors out for me (even when its elements are chained directly), so let's rebuild it a bit, adding vectorization as well, which lets us avoid an otherwise-necessary prior rowwise() call:

tblfy <- Vectorize(function(x){x %>% purrr::map_df(identity) %>% list()})

Now we can use mutate_if nicely:

d %>% mutate_if(purrr::is_list, tblfy)
## Source: local data frame [2 x 2]
## 
##                A     B
##           <list> <chr>
## 1 <tbl_df [2,2]>     A
## 2 <tbl_df [2,2]>     B

...and if we unnest to see what's there,

d %>% mutate_if(purrr::is_list, tblfy) %>% tidyr::unnest()
## Source: local data frame [4 x 3]
## 
##       B     x     y
##   <chr> <chr> <dbl>
## 1     A     a     1
## 2     A     b     2
## 3     B     c     3
## 4     B     d     4

A couple notes:

  • map_df(identity) seems to be more efficient at building a tibble than any of the alternative formulations. I know the identity call seems unnecessary, but most everything else breaks.
  • I'm not sure how widely useful tblfy will be, as it's somewhat dependent on the structure of the lists in the list column, which can vary enormously. If you have a lot with a similar structure, I suppose it's useful, though.
  • There may be a way to do this with pmap instead of Vectorize, but I can't get it to work with some cursory tries.
like image 106
alistaire Avatar answered Nov 14 '22 21:11

alistaire


In-place conversion without any copying:

library(data.table)

for (col in d) if (is.list(col)) lapply(col, setDF)

d
#Source: local data frame [2 x 2]
#
#                A B
#1 <S3:data.frame> A
#2 <S3:data.frame> B
like image 38
eddi Avatar answered Nov 14 '22 23:11

eddi