Suprisingly, i can't find an analogue of SQL's "NOT IN" operator in pandas DataFrames.
A = pd.DataFrame({'a':[6,8,3,9,5],
'b':['II','I','I','III','II']})
B = pd.DataFrame({'c':[1,2,3,4,5]})
I want all rows from A
, which a
doesn't contain values from B
's c
.
Something like:
A = A[ A.a not in B.c]
isin() method, is the pandas equivalent to the well known IN expression in SQL. This method returns a boolean Series indicating whether the elements are contained in the specified values.
To get the nth row in a Pandas DataFrame, we can use the iloc() method. For example, df. iloc[4] will return the 5th row because row numbers start from 0.
pandas mean() Key PointsBy default ignore NaN values and performs mean on index axis.
NumPy performs better than Pandas for 50K rows or less. But, Pandas' performance is better than NumPy's for 500K rows or more. Thus, performance varies between 50K and 500K rows depending on the type of operation.
I think you are really close - need isin
with ~
for negate boolean mask - also instead list
use Series
B.c
:
print (~A.a.isin(B.c))
0 True
1 True
2 False
3 True
4 False
Name: a, dtype: bool
A = A[~A.a.isin(B.c)]
print (A)
a b
0 6 II
1 8 I
3 9 III
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With