I have a data frame and want to filter it in one of two ways, by either column "this" or column "that". I would like to be able to refer to the column name as a variable. How (in dplyr
, if that makes a difference) do I refer to a column name by a variable?
library(dplyr) df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2)) df # this that # 1 1 1 # 2 2 1 # 3 2 2 df %>% filter(this == 1) # this that # 1 1 1
But say I want to use the variable column
to hold either "this" or "that", and filter on whatever the value of column
is. Both as.symbol
and get
work in other contexts, but not this:
column <- "this" df %>% filter(as.symbol(column) == 1) # [1] this that # <0 rows> (or 0-length row.names) df %>% filter(get(column) == 1) # Error in get("this") : object 'this' not found
How can I turn the value of column
into a column name?
To filter a single column of a matrix in R if the matrix has column names, we can simply use single square brackets but this will result in a vector without the column name. If we want to use the column name then column name or column number needs to be passed with drop=FALSE argument as shown in the below examples.
Use inbuilt data sets or create a new data set and look at top few rows in the data set. Then, look at the bottom few rows in the data set. Check the data structure. Filter the data by categorical column using split function.
From the current dplyr documentation (emphasis by me):
dplyr used to offer twin versions of each verb suffixed with an underscore. These versions had standard evaluation (SE) semantics: rather than taking arguments by code, like NSE verbs, they took arguments by value. Their purpose was to make it possible to program with dplyr. However, dplyr now uses tidy evaluation semantics. NSE verbs still capture their arguments, but you can now unquote parts of these arguments. This offers full programmability with NSE verbs. Thus, the underscored versions are now superfluous.
So, essentially we need to perform two steps to be able to refer to the value "this"
of the variable column
inside dplyr::filter()
:
We need to turn the variable column
which is of type character into type symbol
.
Using base R this can be achieved by the function as.symbol()
which is an alias for as.name()
. The former is preferred by the tidyverse developers because it
follows a more modern terminology (R types instead of S modes).
Alternatively, the same can be achieved by rlang::sym()
from the tidyverse.
We need to inject the symbol from 1) into the dplyr::filter()
expression.
This is done by the so called injection operator !!
which is basically syntactic sugar allowing to modify a piece of code before R evaluates it.
(In earlier versions of dplyr
(or the underlying rlang
respectively) there used to be situations (incl. yours) where !!
would collide with the single !
, but this is not an issue anymore since !!
gained the right operator precedence.)
Applied to your example:
library(dplyr) df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2)) column <- "this" df %>% filter(!!as.symbol(column) == 1) # this that # 1 1 1
Other ways to refer to the value "this"
of the variable column
inside dplyr::filter()
that don't rely on rlang's injection paradigm include:
Via the tidyselection paradigm, i.e. dplyr::if_any()
/dplyr::if_all()
with tidyselect::all_of()
df %>% filter(if_any(.cols = all_of(column), .fns = ~ .x == 1))
Via rlang's .data
pronoun and base R's [[
:
df %>% filter(.data[[column]] == 1)
Via magrittr's .
argument placeholder and base R's [[
:
df %>% filter(.[[column]] == 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With