Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R nested tibble map2 comparisons

Tags:

r

purrr

I'm trying to use map2 to compare nested tibble columns. Here is my data format:

> tbl
# A tibble: 3 x 3
  ID    data.x           data.y          
  <chr> <list>           <list>          
1 a     <tibble [2 x 2]> <tibble [2 x 2]>
2 b     <tibble [2 x 2]> <tibble [2 x 2]>
3 c     <tibble [2 x 2]> <tibble [2 x 2]>

tibbles in data.x and data.y are identical from column name perspective, values might be different. I would like to get maximum value from val column. I thought that this would work, but only returns max for data.x. I don't fully grasp idea how map2 works.

tbl %>%
  mutate(col1 = map2_dbl(data.x, data.y, ~ max(.$val)))

result should be:

# A tibble: 3 x 4
  ID    data.x           data.y            col1
  <chr> <list>           <list>           <dbl>
1 a     <tibble [2 x 2]> <tibble [2 x 2]>    7.
2 b     <tibble [2 x 2]> <tibble [2 x 2]>    8.
3 c     <tibble [2 x 2]> <tibble [2 x 2]>    8.

data:

> dput(tbl)
structure(list(ID = c("a", "b", "c"), data.x = list(structure(list(
    text = c("Y", "Y"), val = c(1, 1)), .Names = c("text", "val"
), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(text = c("N", "N"), val = c(2, 2)), .Names = c("text", 
"val"), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(text = c("Y", "Y"), val = c(3, 3)), .Names = c("text", 
"val"), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
))), data.y = list(structure(list(text = c("Y", "Y"), val = c(6, 
7)), .Names = c("text", "val"), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame")), structure(list(text = c("Y", "Y"), val = c(8, 
6)), .Names = c("text", "val"), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame")), structure(list(text = c("N", "N"), val = c(7, 
8)), .Names = c("text", "val"), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame")))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .Names = c("ID", "data.x", "data.y"
))
like image 591
Hakki Avatar asked Mar 26 '18 11:03

Hakki


1 Answers

Based on the expected output, we are extracting the 'val' columns in the data.frame from both 'data.x' and 'data.y' lists, concatenate it together (c) and get the max value

tbl %>% 
    mutate(col1 = map2_dbl(data.x, data.y, ~ max(c(.x$val, .y$val))))
# A tibble: 3 x 4     
#    ID    data.x           data.y            col1
#   <chr> <list>           <list>           <dbl>
#1 a     <tibble [2 x 2]> <tibble [2 x 2]>  7.00
#2 b     <tibble [2 x 2]> <tibble [2 x 2]>  8.00
#3 c     <tibble [2 x 2]> <tibble [2 x 2]>  8.00

For multiple columns, of 'data', pmap can be used

tbl %>%
    mutate(col1 = pmap_dbl(.[-1], ~ max(c(..1$val, ..2$val))))
like image 72
akrun Avatar answered Nov 06 '22 02:11

akrun