Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr mutate specific columns by evaluating lookup cell value

Tags:

I have explored various options using quosures, symbols, and evaluation, but I can't seem to get the right syntax. Here is an example dataframe.

data.frame("A" = letters[1:4], "B" = letters[26:23], "C" = letters[c(1,3,5,7)], "D" = letters[c(2,4,6,8)], "pastecols" = c("B, C","B, D", "B, C, D", NA))
  A B C D pastecols
1 a z a b      B, C
2 b y c d      B, D
3 c x e f   B, C, D
4 d w g h      <NA>

Now suppose I want to paste values from different columns based on the lookup string in pastecols, and I always want to include column A. This is my desired result:

  A B C D pastecols  result
1 a z a b      B, C   a z a
2 b y c d      B, D   b y d
3 c x e f   B, C, D c x e f
4 d w g h      <NA>       d

Ideally this could be done in dplyr. This is the closest I have gotten:

x %>% mutate(result = lapply(lapply(str_split(pastecols, ", "), c, "A"), na.omit))
  A B C D pastecols     result
1 a z a b      B, C    B, C, A
2 b y c d      B, D    B, D, A
3 c x e f   B, C, D B, C, D, A
4 d w g h      <NA>          A
like image 994
tcuthbertson Avatar asked Dec 03 '18 18:12

tcuthbertson


2 Answers

Here's one way using pmap to do a similar thing. pmap can be used to effectively work on dataframes by row by capturing each row as a named vector; you can then get the desired column names for indexing as cols by selecting them with ["pastecols"].

Most of the anonymous function syntax is not tidyverse stuff, but just basic R stuff. To walk through it:

  1. Pass the dataframe as the list to the .l argument of pmap_chr. Remember that dataframes are lists of columns!
  2. Capture all the ... arguments with c(...). Basically we are calling each row of the dataframe as arguments to the function; now row is a named vector containing the row. Note that if you have list-columns this will break, (but so will a lot of other things here so I assume there aren't any...)
  3. We can get the values of row that we want from row["pastecols"], but we need to turn (say) "B, C" into c("A", "B", "C") to do that. This next line just adds the "A", replaces missing values with "A", splits into pieces if there are any, and then indexes back down into the list. The [[ part is just how you do list[[1]]" in a pipe chain, it's the prefix form of the operator. You need this because str_split returns a list and we just want the vector.
  4. Use this cols vector to get the desired values from row and return it, collapsed into a length 1 character vector!
library(tidyverse)
tbl <- tibble("A" = letters[1:4], "B" = letters[26:23], "C" = letters[c(1,3,5,7)], "D" = letters[c(2,4,6,8)], "pastecols" = c("B, C","B, D", "B, C, D", NA))

tbl %>%
  mutate(result = pmap_chr(
    .l = .,
    .f = function(...){
      row <-  c(...)
      cols <- row["pastecols"] %>% str_c("A, ", .) %>% replace_na("A") %>% str_split(", ") %>% `[[`(1)
      vals <- row[cols] %>% str_c(collapse = ", ")
      return(vals)
    }
  ))
#> # A tibble: 4 x 6
#>   A     B     C     D     pastecols result    
#>   <chr> <chr> <chr> <chr> <chr>     <chr>     
#> 1 a     z     a     b     B, C      a, z, a   
#> 2 b     y     c     d     B, D      b, y, d   
#> 3 c     x     e     f     B, C, D   c, x, e, f
#> 4 d     w     g     h     <NA>      d

Created on 2018-12-03 by the reprex package (v0.2.0).

like image 63
Calum You Avatar answered Oct 04 '22 23:10

Calum You


Not the most elegant solution but gets the job done with just base R. If column A never shows up in pastecols you can remove unique() from the code.

for(r in seq_len(nrow(df))) {
  df$result[r] <- paste(
                    df[r, na.omit(unique(c("A", unlist(strsplit(df$pastecols[r], ", ")))))],
                    collapse = " "
                  )
}
df

  A B C D pastecols  result
1 a z a b      B, C   a z a
2 b y c d      B, D   b y d
3 c x e f   B, C, D c x e f
4 d w g h      <NA>       d

Data -

df <- data.frame(
  "A" = letters[1:4], 
  "B" = letters[26:23], 
  "C" = letters[c(1,3,5,7)], 
  "D" = letters[c(2,4,6,8)], 
  "pastecols" = c("B, C","B, D", "B, C, D", NA), stringsAsFactors = F
)
like image 39
Shree Avatar answered Oct 05 '22 00:10

Shree