Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dataframe filtering with condition applied to list of columns

I want to filter a pyspark dataframe if any of the string columns in a list are empty.

df = df.where(all([col(x)!='' for x in col_list]))

ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
like image 419
CyborgDroid Avatar asked Sep 01 '25 09:09

CyborgDroid


1 Answers

You can use reduce from functools to simulate all like this

from functools import reduce

spark_df.where(reduce(lambda x, y: x & y,  (F.col(x) != '' for x in col_list))).show()
like image 188
Sreeram TP Avatar answered Sep 06 '25 04:09

Sreeram TP