Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better way to filter a data frame with dplyr using OR?

Tags:

I have a data frame in R with columns subject1 and subject2 (which contain Library of Congress subject headings). I'd like to filter the data frame by testing whether the subjects match an approved list. Say, for example, that I have this data frame.

data <- data.frame(
  subject1 = c("History", "Biology", "Physics", "Digital Humanities"),
  subject2 = c("Chemistry", "Religion", "Chemistry", "Religion")
)

And suppose this is the list of approved subjects.

condition <- c("History", "Religion")

What I want to do is filter by either subject1 or subject2:

subset <- filter(data, subject1 %in% condition | subject2 %in% condition)

That returns items 1, 2, and 4 from the original data frame, as desired.

Is that the best way to filter by multiple fields using or rather than and logic? It seems like there must be a better, more idiomatic way, but I don't know what it is.

Maybe a more generic way to ask the question is to say, if I combine subject1 and subject2, is there a way of testing if any value in one vector matches any value in another vector. I'd like to write something like:

subset <- filter(data, c(subject1, subject2) %in% condition)
like image 926
Lincoln Mullen Avatar asked Feb 07 '14 20:02

Lincoln Mullen


People also ask

How do I filter multiple values in R dplyr?

In this, first, pass your dataframe object to the filter function, then in the condition parameter write the column name in which you want to filter multiple values then put the %in% operator, and then pass a vector containing all the string values which you want in the result.

How do I filter a Dataframe based on column values in R?

Any dataframe column in the R programming language can be referenced either through its name df$col-name or using its index position in the dataframe df[col-index]. The cell values of this column can then be subjected to constraints, logical or comparative conditions, and then a dataframe subset can be obtained.

Does dplyr work with data frame?

All of the dplyr functions take a data frame (or tibble) as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr.


1 Answers

I'm not sure whether this approach is better. At least you don't have to write the column names:

library(dplyr)
filter(data, rowSums(sapply(data, "%in%", condition)))
#             subject1  subject2
# 1            History Chemistry
# 2            Biology  Religion
# 3 Digital Humanities  Religion
like image 79
Sven Hohenstein Avatar answered Oct 05 '22 22:10

Sven Hohenstein