Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL like NOT IN clause for PySpark data frames

In SQL, we can for example, do select * from table where col1 not in ('A','B');

I was wondering if there is a PySpark equivalent for this. I was able to find the isin function for SQL like IN clause, but nothing for NOT IN.

like image 494
sh1291 Avatar asked Oct 11 '25 18:10

sh1291


1 Answers

I just had the same issue and found solution. If you want to negate any condition (in pySpark represented as Column class) there is negation operator ~, for example:

df.where(~df.flag.isin(1, 2, 3)) # records with flag NOT IN (1, 2, 3)
like image 101
Mariusz Avatar answered Oct 14 '25 17:10

Mariusz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!