Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

r - Filter rows that contain a string from a vector

Tags:

r

dplyr

I'm looking for a function that takes a dataframe column, checks if it contains text from a vector of strings, and filters it upon match (including a partial text match).

For example, take the following data frame:

animal     |count
aardvark   |8
cat        |2
catfish    |6
dog        |12
dolphin    |3
penguin    |38
prairie dog|59
zebra      |17

and the following vector

c("cat", "dog")

I'd like to run through the 'animal' column, checking if the value fully or partially matches by one of the strings in the vector, and filter out the ones that aren't. The resulting data frame would be:

animal     |count
cat        |2
catfish    |6
dog        |12
prairie dog|59

Thank you!

Sean

like image 269
Sean G Avatar asked Aug 02 '16 15:08

Sean G


People also ask

Can you filter a vector in R?

In this article, we are going to discuss how to filter a vector in the R programming language. Filtering a vector means getting the values from the vector by removing the others, we can also say that getting the required elements is known as filtering.


3 Answers

Using dplyr, you can try the following, assuming your table is df:

library(dplyr)
library(stringr)
animalList <- c("cat", "dog")
filter(df, str_detect(animal, paste(animalList, collapse="|")))

I personally find the use of dplyr and stringr to be easier to read months later when reviewing my code.

like image 71
Megatron Avatar answered Oct 22 '22 16:10

Megatron


For large datasets the following base R approach can do the job 15x faster than accepted answer. At least that was my experience.

The code generates a new dataframe to store the subsets of rows that match a given value (animal).

#Create placeholder data frame
new_df <- df[0, ]

#Create vector of unique values
animals <- unique(df$animal)

#Run the loop
for (i in 1:length(animals)){
    temp <- df[df$animal==animals[i], ] 
    new_df <- rbind(new_df,temp)
}
like image 21
Sorlac Avatar answered Oct 22 '22 18:10

Sorlac


We can use grep

df1[grep(paste(v1, collapse="|"), df1$animal),]

Or using dplyr

df1 %>%
    filter(grepl(paste(v1, collapse="|"), animal))
like image 43
akrun Avatar answered Oct 22 '22 17:10

akrun