How to use dplyr operations with a list of strings for column names

Question

Is there a robust way to use a variable that contains a list of strings that correspond to dataframe column names for passing to the various dplyr operations?

I have just been getting into dplyr.

When I try to use operations on a subset of columns in a dataframe, dplyr does great when I name the columns explicitly and one-by-one in comma-separated lists.

This code works as expected

library(dplyr)

# Create dataframe
df <- data.frame(
    a = c(1, 1, 1, 2, 2, 2)
    , b = c(1, 2, 3, 1, 2, 3)
    , c = c(1, 2, 1, 2, 1, 2)
    )

# Identify rows where a * c is duplicated
df %>%
    select(a, c) %>%
    count(a, c) %>%
    filter(n > 1)

However, there are times when I already have a list of column names that I would like to pass into the dplyr steps instead of naming each column explicitly. However, I have not found an easy/convenient way to do this that is robust enough to work with several dplyr operations:

This code is not working

# Attempting to do the same with a named list of relevant columns
relevantCols <- c("a", "c")

# Fails
df %>%
    select(relevantCols)

# Trying to make new variable based on my relevantCols variable
colsForDplyr <- sapply(relevantCols, eval)

df %>%
    # First step succeeds
    select(colsForDplyr) %>%
    # Fails at count step
    count(colsForDplyr)

In the simple example above, it is no big deal to re-type 'a, c' in every dplyr operation. However, if I have a list of columns that is longer, I would rather pass a variable into the dplyr operations instead of re-typing a list of column names over-and-over again.

Any tips on how to achieve this?

I will accept a solution that shows how to create a variable from a list of column names that can be used in various dplyr operations in place of retyping each column name over and over

Ronak Shah · Accepted Answer

We can use syms with !!! to pass columns names as a variable.

library(dplyr)
library(rlang)

relevantCols <- c("a", "c")

df %>%
  count(!!!syms(relevantCols)) %>%
  filter(n > 1)

#  a c n
#1 1 1 2
#2 2 2 2

akrun · Answer

We can use across from dplyr without having to use any other packages

library(dplyr)
df %>% 
     count(across(all_of(relevantCols))) %>% 
     filter(n > 1)
#   a c n
#1 1 1 2
#2 2 2 2

How to use dplyr operations with a list of strings for column names

Tags:

string

r

multiple-columns

dplyr

Jayden.Cameron

2 Answers

Ronak Shah

akrun

Recent Activity

Donate For Us

How to use dplyr operations with a list of strings for column names

Tags:

string

r

multiple-columns

dplyr

Jayden.Cameron

2 Answers

Ronak Shah

akrun

Related questions

Recent Activity

Donate For Us