Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use dplyr operations with a list of strings for column names

Is there a robust way to use a variable that contains a list of strings that correspond to dataframe column names for passing to the various dplyr operations?

I have just been getting into dplyr.

When I try to use operations on a subset of columns in a dataframe, dplyr does great when I name the columns explicitly and one-by-one in comma-separated lists.

This code works as expected

library(dplyr)

# Create dataframe
df <- data.frame(
    a = c(1, 1, 1, 2, 2, 2)
    , b = c(1, 2, 3, 1, 2, 3)
    , c = c(1, 2, 1, 2, 1, 2)
    )

# Identify rows where a * c is duplicated
df %>%
    select(a, c) %>%
    count(a, c) %>%
    filter(n > 1)

However, there are times when I already have a list of column names that I would like to pass into the dplyr steps instead of naming each column explicitly. However, I have not found an easy/convenient way to do this that is robust enough to work with several dplyr operations:

This code is not working

# Attempting to do the same with a named list of relevant columns
relevantCols <- c("a", "c")

# Fails
df %>%
    select(relevantCols)

# Trying to make new variable based on my relevantCols variable
colsForDplyr <- sapply(relevantCols, eval)

df %>%
    # First step succeeds
    select(colsForDplyr) %>%
    # Fails at count step
    count(colsForDplyr)

In the simple example above, it is no big deal to re-type 'a, c' in every dplyr operation. However, if I have a list of columns that is longer, I would rather pass a variable into the dplyr operations instead of re-typing a list of column names over-and-over again.

Any tips on how to achieve this?

I will accept a solution that shows how to create a variable from a list of column names that can be used in various dplyr operations in place of retyping each column name over and over

like image 459
Jayden.Cameron Avatar asked Oct 28 '25 14:10

Jayden.Cameron


2 Answers

We can use syms with !!! to pass columns names as a variable.

library(dplyr)
library(rlang)

relevantCols <- c("a", "c")

df %>%
  count(!!!syms(relevantCols)) %>%
  filter(n > 1)

#  a c n
#1 1 1 2
#2 2 2 2
like image 150
Ronak Shah Avatar answered Oct 30 '25 07:10

Ronak Shah


We can use across from dplyr without having to use any other packages

library(dplyr)
df %>% 
     count(across(all_of(relevantCols))) %>% 
     filter(n > 1)
#   a c n
#1 1 1 2
#2 2 2 2
like image 30
akrun Avatar answered Oct 30 '25 07:10

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!