Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr: filter where two columns in data.frame are equal

Tags:

r

dplyr

In base R one can easily filter to rows where two columns are equals like so:

mtcars[mtcars$cyl==mtcars$carb,]

Using dplyr's filter this can be done easily

mtcars %>% filter(cyl==carb)

But if I am writing a function using this code I would want to use filter_, but this code doesn't work

mtcars %>% filter_("cyl"=="carb")

Since in this case it thinks "carb" is a value to test rather than a variable.

My question is how can you use filter_ to compare two variables in a data.frame?

like image 496
Carl Avatar asked Dec 19 '22 18:12

Carl


2 Answers

Put the whole thing in quotes:

mtcars %>% filter_("cyl==carb")

Or, as effel has already suggested, this will also work:

mtcars %>% filter_(~cyl==carb)
like image 166
C8H10N4O2 Avatar answered Jan 29 '23 13:01

C8H10N4O2


There's more on this here.

It’s best to use a formula, because a formula captures both the expression to evaluate, and the environment in which it should be a evaluated. This is important if the expression is a mixture of variables in the data frame and objects in the local environment

library(dplyr)

airquality %>%
  filter_(~Month == Day)
airquality %>% filter_(~Month == Day)
#   Ozone Solar.R Wind Temp Month Day
# 1    NA      NA 14.3   56     5   5
# 2    NA     264 14.3   79     6   6
# 3    77     276  5.1   88     7   7
# 4    89     229 10.3   90     8   8
# 5    21     230 10.9   75     9   9

Alternatively:

There are three ways to quote inputs that dplyr understands: With a formula, ~ mean(mpg). With quote(), quote(mean(mpg)). As a string: "mean(mpg)".

like image 30
effel Avatar answered Jan 29 '23 11:01

effel