I have explored various options using quosures, symbols, and evaluation, but I can't seem to get the right syntax. Here is an example dataframe.
data.frame("A" = letters[1:4], "B" = letters[26:23], "C" = letters[c(1,3,5,7)], "D" = letters[c(2,4,6,8)], "pastecols" = c("B, C","B, D", "B, C, D", NA))
A B C D pastecols
1 a z a b B, C
2 b y c d B, D
3 c x e f B, C, D
4 d w g h <NA>
Now suppose I want to paste values from different columns based on the lookup string in pastecols, and I always want to include column A. This is my desired result:
A B C D pastecols result
1 a z a b B, C a z a
2 b y c d B, D b y d
3 c x e f B, C, D c x e f
4 d w g h <NA> d
Ideally this could be done in dplyr. This is the closest I have gotten:
x %>% mutate(result = lapply(lapply(str_split(pastecols, ", "), c, "A"), na.omit))
A B C D pastecols result
1 a z a b B, C B, C, A
2 b y c d B, D B, D, A
3 c x e f B, C, D B, C, D, A
4 d w g h <NA> A
Here's one way using pmap
to do a similar thing. pmap
can be used to effectively work on dataframes by row by capturing each row as a named vector; you can then get the desired column names for indexing as cols
by selecting them with ["pastecols"]
.
Most of the anonymous function syntax is not tidyverse
stuff, but just basic R stuff. To walk through it:
.l
argument of pmap_chr
. Remember that dataframes are lists of columns!...
arguments with c(...)
. Basically we are calling each row of the dataframe as arguments to the function; now row
is a named vector containing the row. Note that if you have list-columns this will break, (but so will a lot of other things here so I assume there aren't any...)row
that we want from row["pastecols"]
, but we need to turn (say) "B, C"
into c("A", "B", "C")
to do that. This next line just adds the "A"
, replaces missing values with "A"
, splits into pieces if there are any, and then indexes back down into the list. The [[
part is just how you do list[[1]]"
in a pipe chain, it's the prefix form of the operator. You need this because str_split
returns a list and we just want the vector.cols
vector to get the desired values from row
and return it, collapsed into a length 1 character vector!library(tidyverse)
tbl <- tibble("A" = letters[1:4], "B" = letters[26:23], "C" = letters[c(1,3,5,7)], "D" = letters[c(2,4,6,8)], "pastecols" = c("B, C","B, D", "B, C, D", NA))
tbl %>%
mutate(result = pmap_chr(
.l = .,
.f = function(...){
row <- c(...)
cols <- row["pastecols"] %>% str_c("A, ", .) %>% replace_na("A") %>% str_split(", ") %>% `[[`(1)
vals <- row[cols] %>% str_c(collapse = ", ")
return(vals)
}
))
#> # A tibble: 4 x 6
#> A B C D pastecols result
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a z a b B, C a, z, a
#> 2 b y c d B, D b, y, d
#> 3 c x e f B, C, D c, x, e, f
#> 4 d w g h <NA> d
Created on 2018-12-03 by the reprex package (v0.2.0).
Not the most elegant solution but gets the job done with just base R. If column A
never shows up in pastecols
you can remove unique()
from the code.
for(r in seq_len(nrow(df))) {
df$result[r] <- paste(
df[r, na.omit(unique(c("A", unlist(strsplit(df$pastecols[r], ", ")))))],
collapse = " "
)
}
df
A B C D pastecols result
1 a z a b B, C a z a
2 b y c d B, D b y d
3 c x e f B, C, D c x e f
4 d w g h <NA> d
Data -
df <- data.frame(
"A" = letters[1:4],
"B" = letters[26:23],
"C" = letters[c(1,3,5,7)],
"D" = letters[c(2,4,6,8)],
"pastecols" = c("B, C","B, D", "B, C, D", NA), stringsAsFactors = F
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With