Lets say I have a dataframe like this:
id num
0 1 1
1 2 2
2 3 1
3 4 2
4 1 1
5 2 2
6 3 1
7 4 2
The above can be generated with this for testing purposes:
test = pd.DataFrame({'id': np.array([1,2,3,4] * 2,dtype='int32'),
'num': np.array([1,2] * 4,dtype='int32')
})
Now, I want to keep only the rows where a certain condition is met: id
is not 1 AND num
is not 1. Essentially I want to remove the rows with index 0 and 4. For my actual dataset its easier to remove the rows I dont want rather than to specify the rows that I do want
I have tried this:
test = test[(test['id'] != 1) & (test['num'] != 1)]
However, that gives me this:
id num
1 2 2
3 4 2
5 2 2
7 4 2
It seems to have removed all rows where id
is 1 OR num
is 1
I've seen a number of other questions where the answer is the one I used above but it doesn't seem to be working out in my case
If you change the boolean condition to be equality and invert the combined boolean conditions by enclosing both in additional parentheses then you get the desired behaviour:
In [14]:
test = test[~((test['id'] == 1) & (test['num'] == 1))]
test
Out[14]:
id num
1 2 2
2 3 1
3 4 2
5 2 2
6 3 1
7 4 2
I also think your understanding of boolean syntax is incorrect what you want is to or
the conditions:
In [22]:
test = test[(test['id'] != 1) | (test['num'] != 1)]
test
Out[22]:
id num
1 2 2
2 3 1
3 4 2
5 2 2
6 3 1
7 4 2
If you think about what this means the first condition excludes any row where 'id' is equal to 1 and similarly for the 'num' column:
In [24]:
test[test['id'] != 1]
Out[24]:
id num
1 2 2
2 3 1
3 4 2
5 2 2
6 3 1
7 4 2
In [25]:
test[test['num'] != 1]
Out[25]:
id num
1 2 2
3 4 2
5 2 2
7 4 2
So really you wanted to or
(|
) the above conditions
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With