Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas analogue of SQL's "NOT IN" operator

Tags:

python

pandas

Suprisingly, i can't find an analogue of SQL's "NOT IN" operator in pandas DataFrames.

A = pd.DataFrame({'a':[6,8,3,9,5],
                       'b':['II','I','I','III','II']})

B = pd.DataFrame({'c':[1,2,3,4,5]})

I want all rows from A, which a doesn't contain values from B's c. Something like:

A = A[ A.a not in B.c]
like image 976
Ladenkov Vladislav Avatar asked Apr 06 '17 13:04

Ladenkov Vladislav


People also ask

Is in operator in pandas?

isin() method, is the pandas equivalent to the well known IN expression in SQL. This method returns a boolean Series indicating whether the elements are contained in the specified values.

How do you select nth row in pandas?

To get the nth row in a Pandas DataFrame, we can use the iloc() method. For example, df. iloc[4] will return the 5th row because row numbers start from 0.

Does pandas mean ignore NaN?

pandas mean() Key PointsBy default ignore NaN values and performs mean on index axis.

Is Panda faster than NP?

NumPy performs better than Pandas for 50K rows or less. But, Pandas' performance is better than NumPy's for 500K rows or more. Thus, performance varies between 50K and 500K rows depending on the type of operation.


1 Answers

I think you are really close - need isin with ~ for negate boolean mask - also instead list use Series B.c:

print (~A.a.isin(B.c))
0     True
1     True
2    False
3     True
4    False
Name: a, dtype: bool

A = A[~A.a.isin(B.c)]
print (A)
   a    b
0  6   II
1  8    I
3  9  III
like image 175
jezrael Avatar answered Sep 20 '22 12:09

jezrael