I'm looking for a function that takes a dataframe column, checks if it contains text from a vector of strings, and filters it upon match (including a partial text match).
For example, take the following data frame:
animal |count
aardvark |8
cat |2
catfish |6
dog |12
dolphin |3
penguin |38
prairie dog|59
zebra |17
and the following vector
c("cat", "dog")
I'd like to run through the 'animal' column, checking if the value fully or partially matches by one of the strings in the vector, and filter out the ones that aren't. The resulting data frame would be:
animal |count
cat |2
catfish |6
dog |12
prairie dog|59
Thank you!
Sean
In this article, we are going to discuss how to filter a vector in the R programming language. Filtering a vector means getting the values from the vector by removing the others, we can also say that getting the required elements is known as filtering.
Using dplyr
, you can try the following, assuming your table is df
:
library(dplyr)
library(stringr)
animalList <- c("cat", "dog")
filter(df, str_detect(animal, paste(animalList, collapse="|")))
I personally find the use of dplyr
and stringr
to be easier to read months later when reviewing my code.
For large datasets the following base R
approach can do the job 15x faster than accepted answer. At least that was my experience.
The code generates a new dataframe to store the subsets of rows that match a given value (animal).
#Create placeholder data frame
new_df <- df[0, ]
#Create vector of unique values
animals <- unique(df$animal)
#Run the loop
for (i in 1:length(animals)){
temp <- df[df$animal==animals[i], ]
new_df <- rbind(new_df,temp)
}
We can use grep
df1[grep(paste(v1, collapse="|"), df1$animal),]
Or using dplyr
df1 %>%
filter(grepl(paste(v1, collapse="|"), animal))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With