What is the best way to filter rows from data frame when the values to be deleted are stored in a vector? In my case I have a column with dates and want to remove several dates.
I know how to delete rows corresponding to one day, using !=
, e.g.:
m[m$date != "01/31/11", ]
To remove several dates, specified in a vector, I tried:
m[m$date != c("01/31/11", "01/30/11"), ]
However, this generates a warning message:
Warning message:
In `!=.default`(m$date, c("01/31/11", "01/30/11")) :
longer object length is not a multiple of shorter object length
Calls: [ ... [.data.frame -> Ops.dates -> NextMethod -> Ops.times -> NextMethod
What is the correct way to apply a filter based on multiple values?
If we have a vector and a data frame, and the data frame has a column that contains the values similar as in the vector then we can create a subset of the data frame based on that vector. This can be done with the help of single square brackets and %in% operator.
By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
The way you tell R that you want to select some particular elements (i.e., a 'subset') from a vector is by placing an 'index vector' in square brackets immediately following the name of the vector. For a simple example, try x[1:10] to view the first ten elements of x.
nzcoops is spot on with his suggestion. I posed this question in the R Chat a while back and Paul Teetor suggested defining a new function:
`%notin%` <- function(x,y) !(x %in% y)
Which can then be used as follows:
foo <- letters[1:6]
> foo[foo %notin% c("a", "c", "e")]
[1] "b" "d" "f"
Needless to say, this little gem is now in my R profile and gets used quite often.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With