Filter multiple values on a string column in dplyr

Tags:

I have a data.frame with character data in one of the columns. I would like to filter multiple options in the data.frame from the same column. Is there an easy way to do this that I'm missing?

Example: data.frame name = dat

days      name 88        Lynn 11          Tom 2           Chris 5           Lisa 22        Kyla 1          Tom 222      Lynn 2         Lynn

I'd like to filter out Tom and Lynn for example.
When I do:

target <- c("Tom", "Lynn") filt <- filter(dat, name == target)

I get this error:

longer object length is not a multiple of shorter object length

877

asked Sep 03 '14 14:09

Tom O

1 Answers

You need %in% instead of ==:

library(dplyr) target <- c("Tom", "Lynn") filter(dat, name %in% target)  # equivalently, dat %>% filter(name %in% target)

Produces

  days name 1   88 Lynn 2   11  Tom 3    1  Tom 4  222 Lynn 5    2 Lynn

To understand why, consider what happens here:

dat$name == target # [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

Basically, we're recycling the two length target vector four times to match the length of dat$name. In other words, we are doing:

 Lynn == Tom   Tom == Lynn Chris == Tom  Lisa == Lynn  ... continue repeating Tom and Lynn until end of data frame

In this case we don't get an error because I suspect your data frame actually has a different number of rows that don't allow recycling, but the sample you provide does (8 rows). If the sample had had an odd number of rows I would have gotten the same error as you. But even when recycling works, this is clearly not what you want. Basically, the statement dat$name == target is equivalent to saying:

return TRUE for every odd value that is equal to "Tom" or every even value that is equal to "Lynn".

It so happens that the last value in your sample data frame is even and equal to "Lynn", hence the one TRUE above.

To contrast, dat$name %in% target says:

for each value in dat$name, check that it exists in target.

Very different. Here is the result:

[1]  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE

Note your problem has nothing to do with dplyr, just the mis-use of ==.

answered Oct 25 '22 16:10

BrodieG

Related questions
                            
                                Plot multiple boxplot in one graph
                            
                                Handling java.lang.OutOfMemoryError when writing to Excel from R
                            
                                What is the difference between parent.frame() and parent.env() in R; how do they differ in call by reference?
                            
                                Why is `vapply` safer than `sapply`?
                            
                                What's the difference between facet_wrap() and facet_grid() in ggplot2?
                            
                                How to assign the result of the previous expression to a variable?
                            
                                Zip or enumerate in R?
                            
                                R - How to test for character(0) in IF statement
                            
                                Function to concatenate paths?
                            
                                Control point border thickness in ggplot
                            
                                How to show code but hide output in RMarkdown?
                            
                                Position of the sun given time of day, latitude and longitude
                            
                                use %>% with replacement functions like colnames()<-
                            
                                Get the difference between dates in terms of weeks, months, quarters, and years
                            
                                How to plot all the columns of a data frame in R
                            
                                Get all Parameters as List
                            
                                How to overlay density plots in R?
                            
                                Use a value from the previous row in an R data.table calculation
                            
                                How to prevent scientific notation in R? [duplicate]
                            
                                Legend on bottom, two rows wrapped in ggplot2 in r

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Filter multiple values on a string column in dplyr

Tags:

string-matching

r

dplyr

multiple-conditions

Tom O

People also ask

1 Answers

BrodieG

Recent Activity

Donate For Us