Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset dataframe by multiple logical conditions of rows to remove

I would like to subset (filter) a dataframe by specifying which rows not (!) to keep in the new dataframe. Here is a simplified sample dataframe:

data
v1 v2 v3 v4
a  v  d  c
a  v  d  d
b  n  p  g
b  d  d  h    
c  k  d  c    
c  r  p  g
d  v  d  x
d  v  d  c
e  v  d  b
e  v  d  c

For example, if a row of column v1 has a "b", "d", or "e", I want to get rid of that row of observations, producing the following dataframe:

v1 v2 v3 v4
a  v  d  c
a  v  d  d
c  k  d  c    
c  r  p  g

I have been successful at subsetting based on one condition at a time. For example, here I remove rows where v1 contains a "b":

sub.data <- data[data[ , 1] != "b", ]

However, I have many, many such conditions, so doing it one at a time is not desirable. I have not been successful with the following:

sub.data <- data[data[ , 1] != c("b", "d", "e")

or

sub.data <- subset(data, data[ , 1] != c("b", "d", "e"))

I've tried some other things as well, like !%in%, but that doesn't seem to exist. Any ideas?

like image 823
Jota Avatar asked Jun 05 '11 16:06

Jota


People also ask

How do I remove rows from a Dataframe based on condition in R?

For example, we can use the subset() function if we want to drop a row based on a condition. If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset).

How do I remove rows from two conditions in R?

To remove rows of data from a dataframe based on multiple conditional statements. We use square brackets [ ] with the dataframe and put multiple conditional statements along with AND or OR operator inside it. This slices the dataframe and removes all the rows that do not satisfy the given conditions.

How do I filter a Dataframe in multiple conditions in R?

Method 1: Using indexing method and which() function Multiple conditions can also be combined using which() method in R. The which() function in R returns the position of the value which satisfies the given condition. The %in% operator is used to check a value in the vector specified.


3 Answers

Try this

subset(data, !(v1 %in% c("b","d","e")))
like image 146
chl Avatar answered Oct 12 '22 10:10

chl


The ! should be around the outside of the statement:

data[!(data$v1 %in% c("b", "d", "e")), ]

  v1 v2 v3 v4
1  a  v  d  c
2  a  v  d  d
5  c  k  d  c
6  c  r  p  g
like image 25
Andrie Avatar answered Oct 12 '22 12:10

Andrie


You can also accomplish this by breaking things up into separate logical statements by including & to separate the statements.

subset(my.df, my.df$v1 != "b" & my.df$v1 != "d" & my.df$v1 != "e")

This is not elegant and takes more code but might be more readable to newer R users. As pointed out in a comment above, subset is a "convenience" function that is best used when working interactively.

like image 10
N Brouwer Avatar answered Oct 12 '22 10:10

N Brouwer