Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid conflicts between vector and variable name in dplyr

I'm using dplyr within a function that takes a data.frame df as an argument.

At some point, I want to filter based on a vector I've just created named n. However, this won't work if n is also the name of a variable in the input data.frame.

library(dplyr)
df <- data.frame(n = c(0L, 0L))
n <- c(1L, 1L)
filter(df, n == 1L)
#> [1] n
#> <0 rows> (or 0-length row.names)

Since the function should work for any dataframe, I would like to avoid this. I tried to use a formula/lazy object associated with the global environment but this returned the same result:

a <- ~ n == 1L
filter_(df, a)
#> [1] n
#> <0 rows> (or 0-length row.names)
a <- lazy(n == 1L)
filter_(df, a)
#> [1] n
#> <0 rows> (or 0-length row.names)

Is there an elegant way to do it?

like image 676
Matthew Avatar asked Nov 10 '14 14:11

Matthew


2 Answers

All the previous answers are outdated because dplyr now supports rlang quoting and unquoting semantics.

You can simply use !! n to prevent n from being quoted (and interpreted as the column n).

library(dplyr)
df <- data.frame(n = c(0L, 0L))
n <- c(1L, 1L)
filter(df, !! n == 1L)

##   n
## 1 0
## 2 0

Another example using the classic mtcars:

gear <- 5

# gear == gear is true for all rows!
# this returns the whole dataset
filter(mtcars, gear == gear)

# this works as intended
filter(mtcars, gear == !! gear)

##   mpg cyl  disp  hp drat    wt qsec vs am gear carb
## 1 26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
## 2 30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
## 3 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
## 4 19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
## 5 15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
like image 183
asachet Avatar answered Sep 26 '22 20:09

asachet


Because n is both a variable name and an object containing values, using interp from lazyeval and using n as a value (and not as a variable) appears to do what you want.

library(lazyeval)
filter_(df, interp(~n == 1L, n = n))

  n
1 0
2 0

I first tried the more complex

filter_(df, interp(~n == 1L, .values = list(n = n)))

but the simpler version seems to work the same way.

like image 33
aosmith Avatar answered Sep 25 '22 20:09

aosmith