I have data similar to this:
dt <- structure(list(fct = structure(c(1L, 2L, 3L, 4L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), X = c(2L, 4L, 3L, 2L, 5L, 4L, 7L, 2L, 9L, 1L, 4L, 2L, 5L, 4L, 2L)), .Names = c("fct", "X"), class = "data.frame", row.names = c(NA, -15L))
I want to select rows from this data frame based on the values in the fct
variable. For example, if I wish to select rows containing either "a" or "c" I can do this:
dt[dt$fct == 'a' | dt$fct == 'c', ]
which yields
1 a 2 3 c 3 5 c 5 7 a 7 9 c 9 10 a 1 12 c 2 14 c 4
as expected. But my actual data is more complex and I actually want to select rows based on the values in a vector such as
vc <- c('a', 'c')
So I tried
dt[dt$fct == vc, ]
but of course that doesn't work. I know I could code something to loop through the vector and pull out the rows needed and append them to a new dataframe, but I was hoping there was a more elegant way.
So how can I filter/subset my data based on the contents of the vector vc
?
If we have a vector and a data frame, and the data frame has a column that contains the values similar as in the vector then we can create a subset of the data frame based on that vector. This can be done with the help of single square brackets and %in% operator.
The way you tell R that you want to select some particular elements (i.e., a 'subset') from a vector is by placing an 'index vector' in square brackets immediately following the name of the vector. For a simple example, try x[1:10] to view the first ten elements of x.
Have a look at ?"%in%"
.
dt[dt$fct %in% vc,] fct X 1 a 2 3 c 3 5 c 5 7 a 7 9 c 9 10 a 1 12 c 2 14 c 4
You could also use ?is.element
:
dt[is.element(dt$fct, vc),]
Similar to above, using filter
from dplyr
:
filter(df, fct %in% vc)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With