I have data similar to this: <pre class="prettyprint"><code>dt <- structure(list(fct = structure(c(1L, 2L, 3L, 4L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), X = c(2L, 4L, 3L, 2L, 5L, 4L, 7L, 2L, 9L, 1L, 4L, 2L, 5L, 4L, 2L)), .Names = c("fct", "X"), class = "data.frame", row.names = c(NA, -15L)) </code></pre> I want to select rows from this data frame based on the values in the <code>fct</code> variable. For example, if I wish to select rows containing either "a" or "c" I can do this: <pre class="prettyprint"><code>dt[dt$fct == 'a' | dt$fct == 'c', ] </code></pre> which yields <pre class="prettyprint"><code>1 a 2 3 c 3 5 c 5 7 a 7 9 c 9 10 a 1 12 c 2 14 c 4 </code></pre> as expected. But my actual data is more complex and I actually want to select rows based on the values in a vector such as <pre class="prettyprint"><code>vc <- c('a', 'c') </code></pre> So I tried <pre class="prettyprint"><code>dt[dt$fct == vc, ] </code></pre> but of course that doesn't work. I know I could code something to loop through the vector and pull out the rows needed and append them to a new dataframe, but I was hoping there was a more elegant way. So how can I filter/subset my data based on the contents of the vector <code>vc</code>?

Have a look at <code>?"%in%"</code>. <pre class="prettyprint"><code>dt[dt$fct %in% vc,] fct X 1 a 2 3 c 3 5 c 5 7 a 7 9 c 9 10 a 1 12 c 2 14 c 4 </code></pre> You could also use <code>?is.element</code>: <pre class="prettyprint"><code>dt[is.element(dt$fct, vc),] </code></pre>

Similar to above, using <code>filter</code> from <code>dplyr</code>: <pre class="prettyprint"><code>filter(df, fct %in% vc) </code></pre>

Select rows from a data frame based on values in a vector

Tags:

r

r-faq

subset

I have data similar to this:

dt <- structure(list(fct = structure(c(1L, 2L, 3L, 4L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), X = c(2L, 4L, 3L, 2L, 5L, 4L, 7L, 2L, 9L, 1L, 4L, 2L, 5L, 4L, 2L)), .Names = c("fct", "X"), class = "data.frame", row.names = c(NA, -15L))

I want to select rows from this data frame based on the values in the fct variable. For example, if I wish to select rows containing either "a" or "c" I can do this:

dt[dt$fct == 'a' | dt$fct == 'c', ]

which yields

1    a 2 3    c 3 5    c 5 7    a 7 9    c 9 10   a 1 12   c 2 14   c 4

as expected. But my actual data is more complex and I actually want to select rows based on the values in a vector such as

vc <- c('a', 'c')

So I tried

dt[dt$fct == vc, ]

but of course that doesn't work. I know I could code something to loop through the vector and pull out the rows needed and append them to a new dataframe, but I was hoping there was a more elegant way.

So how can I filter/subset my data based on the contents of the vector vc?

338

asked Jul 23 '12 12:07

Joe King

2 Answers

Have a look at ?"%in%".

dt[dt$fct %in% vc,]    fct X 1    a 2 3    c 3 5    c 5 7    a 7 9    c 9 10   a 1 12   c 2 14   c 4

You could also use ?is.element:

dt[is.element(dt$fct, vc),]

117

answered Sep 20 '22 14:09

johannes

Similar to above, using filter from dplyr:

filter(df, fct %in% vc)

answered Sep 20 '22 14:09

Andrew Haynes

Related questions
                            
                                How do you add a general label to facets in ggplot2?
                            
                                Types and classes of variables
                            
                                How do I deal with special characters like \^$.?*|+()[{ in my regex?
                            
                                What does "The following object is masked from 'package:xxx'" mean?
                            
                                Error in fetch(key) : lazy-load database
                            
                                Usage of `...` (three-dots or dot-dot-dot) in functions [duplicate]
                            
                                ggplot combining two plots from different data.frames
                            
                                Return index of the smallest value in a vector?
                            
                                Create a data.frame where a column is a list
                            
                                Formula with dynamic number of variables
                            
                                How can I interrupt a running code in R with a keyboard command?
                            
                                Trimming a huge (3.5 GB) csv file to read into R
                            
                                R sequence of dates with lubridate
                            
                                Saving a high resolution image in R
                            
                                Removing NA in dplyr pipe [duplicate]
                            
                                How to parse milliseconds?
                            
                                Is there a built-in way to do a logarithmic color scale in ggplot2?
                            
                                Creating a Prompt/Answer system to input data into R
                            
                                R Apply() function on specific dataframe columns
                            
                                Select random element in a list of R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With