Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr concat columns stored in variable (mutate and non standard evaluation)

Tags:

r

dplyr

nse

I would like to concatenate an arbitrary number of columns in a dataframe based on a variable cols_to_concat

df <- dplyr::data_frame(a = letters[1:3], b = letters[4:6], c = letters[7:9])
cols_to_concat = c("a", "b", "c")

To achieve the desired result with this specific value of cols_to_concat I could do this:

df %>% 
  dplyr::mutate(concat = paste0(a, b, c))

But I need to generalise this, using syntax a bit like this

# (DOES NOT WORK)
df %>% 
  dplyr::mutate(concat = paste0(cols))

I'd like to use the new NSE approach of dplyr 0.7.0, if this is appropriate, but can't figure out the correct syntax.

like image 840
RobinL Avatar asked Dec 14 '22 22:12

RobinL


2 Answers

You can perform this operation using only the tidyverse if you'd like to stick to those packages and principles. You can do it by using either mutate() or unite_(), which comes from the tidyr package.

Using mutate()

library(dplyr)
df <- tibble(a = letters[1:3], b = letters[4:6], c = letters[7:9])
cols_to_concat <- c("a", "b", "c")

df %>% mutate(new_col = do.call(paste0, .[cols_to_concat]))

# A tibble: 3 × 4
      a     b     c new_col
  <chr> <chr> <chr>   <chr>
1     a     d     g     adg
2     b     e     h     beh
3     c     f     i     cfi

Using unite_()

library(tidyr)
df %>% unite_(col='new_col', cols_to_concat, sep="", remove=FALSE)

# A tibble: 3 × 4
  new_col     a     b     c
*   <chr> <chr> <chr> <chr>
1     adg     a     d     g
2     beh     b     e     h
3     cfi     c     f     i

EDITED July 2020

As of dplyr 1.0.0, it appears that across() and c_across() are replacing the underscore verbs (e.g. unite_) and scoped variants like mutate_if(), mutate_at() and mutate_all(). Below is an example using that convention. Not the most concise, but still an option that promises to be more extensible.

Using c_across()

library(dplyr)

df <- tibble(a = letters[1:3], b = letters[4:6], c = letters[7:9])
cols_to_concat <- c("a", "b", "c")

df %>% 
  rowwise() %>% 
  mutate(new_col = paste0(c_across(all_of(cols_to_concat)), collapse=""))
#> # A tibble: 3 x 4
#> # Rowwise: 
#>   a     b     c     new_col
#>   <chr> <chr> <chr> <chr>  
#> 1 a     d     g     adg    
#> 2 b     e     h     beh    
#> 3 c     f     i     cfi

Created on 2020-07-08 by the reprex package (v0.3.0)

like image 71
Steven M. Mortimer Avatar answered May 16 '23 08:05

Steven M. Mortimer


You can try syms from rlang:

library(dplyr)
packageVersion('dplyr')
#[1] ‘0.7.0’
df <- dplyr::data_frame(a = letters[1:3], b = letters[4:6], c = letters[7:9])
cols_to_concat = c("a", "b", "c")

library(rlang)
cols_quo <- syms(cols_to_concat)
df %>% mutate(concat = paste0(!!!cols_quo))

# or
df %>% mutate(concat = paste0(!!!syms(cols_to_concat)))

# # A tibble: 3 x 4
#       a     b     c concat
#   <chr> <chr> <chr>  <chr>
# 1     a     d     g    adg
# 2     b     e     h    beh
# 3     c     f     i    cfi
like image 37
mt1022 Avatar answered May 16 '23 08:05

mt1022