Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to refer to variable instead of column with dplyr

Tags:

r

dplyr

When using dplyr:filter, I often compute a local variable that holds the viable choices:

df <- as_tibble(data.frame(id=c("a","b"), val=1:6))
ids <- c("b","c")
filter(df, id %in% ids)
# giving id %in% c("b","c")

However, if the dataset by chance has a column with the same name, this fails to achieve the intended purpose:

df$ids <- "a"
filter(df, id %in% ids)
# giving id %in% "a"

How should I explicitly refer to the ids variable instead of the ids column?

like image 810
Thomas Wutzler Avatar asked Dec 05 '17 17:12

Thomas Wutzler


People also ask

How do I select variables in dplyr?

Use the dplyr package to manipulate data frames. Use select() to choose variables from a data frame. Use filter() to choose data based on values. Use group_by() and summarize() to work with subsets of data.

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

How do I remove a column in dplyr in R?

dplyr select() function is used to select the column and by using negation of this to remove columns.

How do I select specific data in R?

To select a specific column, you can also type in the name of the dataframe, followed by a $ , and then the name of the column you are looking to select. In this example, we will be selecting the payment column of the dataframe. When running this script, R will simplify the result as a vector.


1 Answers

Unquote with !! to tell filter to look in the calling environment instead of the data frame:

library(tidyverse)

df <- data_frame(id = rep(c("a","b"), 3), val = 1:6)
ids <- c("b", "c")

df %>% filter(id %in% ids)
#> # A tibble: 3 x 2
#>      id   val
#>   <chr> <int>
#> 1     b     2
#> 2     b     4
#> 3     b     6

df <- df %>% mutate(ids = "a")

df %>% filter(id %in% ids)
#> # A tibble: 3 x 3
#>      id   val   ids
#>   <chr> <int> <chr>
#> 1     a     1     a
#> 2     a     3     a
#> 3     a     5     a

df %>% filter(id %in% !!ids)
#> # A tibble: 3 x 3
#>      id   val   ids
#>   <chr> <int> <chr>
#> 1     b     2     a
#> 2     b     4     a
#> 3     b     6     a

Of course, the better way to avoid such issues is to not put identically-named vectors in your global environment.

like image 151
alistaire Avatar answered Sep 21 '22 19:09

alistaire