Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas remove rows where multiple conditions are not met

Lets say I have a dataframe like this:

   id  num
0   1    1
1   2    2
2   3    1
3   4    2
4   1    1
5   2    2
6   3    1
7   4    2

The above can be generated with this for testing purposes:

test = pd.DataFrame({'id': np.array([1,2,3,4] * 2,dtype='int32'),
                     'num': np.array([1,2] * 4,dtype='int32')
                    })

Now, I want to keep only the rows where a certain condition is met: id is not 1 AND num is not 1. Essentially I want to remove the rows with index 0 and 4. For my actual dataset its easier to remove the rows I dont want rather than to specify the rows that I do want

I have tried this:

test = test[(test['id'] != 1) & (test['num'] != 1)]

However, that gives me this:

   id  num
1   2    2
3   4    2
5   2    2
7   4    2

It seems to have removed all rows where id is 1 OR num is 1

I've seen a number of other questions where the answer is the one I used above but it doesn't seem to be working out in my case

like image 982
Simon Avatar asked Aug 08 '16 09:08

Simon


Video Answer


1 Answers

If you change the boolean condition to be equality and invert the combined boolean conditions by enclosing both in additional parentheses then you get the desired behaviour:

In [14]:    
test = test[~((test['id'] == 1) & (test['num'] == 1))]
test

Out[14]:
   id  num
1   2    2
2   3    1
3   4    2
5   2    2
6   3    1
7   4    2

I also think your understanding of boolean syntax is incorrect what you want is to or the conditions:

In [22]:
test = test[(test['id'] != 1) | (test['num'] != 1)]
test

Out[22]:
   id  num
1   2    2
2   3    1
3   4    2
5   2    2
6   3    1
7   4    2

If you think about what this means the first condition excludes any row where 'id' is equal to 1 and similarly for the 'num' column:

In [24]:
test[test['id'] != 1]

Out[24]:
   id  num
1   2    2
2   3    1
3   4    2
5   2    2
6   3    1
7   4    2

In [25]:
test[test['num'] != 1]

Out[25]:
   id  num
1   2    2
3   4    2
5   2    2
7   4    2

So really you wanted to or (|) the above conditions

like image 122
EdChum Avatar answered Nov 13 '22 07:11

EdChum