Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python / Pyspark - Count NULL, empty and NaN

Tags:

python

pyspark

i want to count NULL, empty and NaN values in a column. I tried it like this:

df.filter( (df["ID"] == "") | (df["ID"].isNull()) | ( df["ID"].isnan()) ).count()

But i always get this error message:

TypeError: 'Column' object is not callable

Does anyone have an idea what might be the problem?

Many thanks in advance!

like image 584
qwertz Avatar asked Jan 12 '18 15:01

qwertz


People also ask

Does PySpark count include null?

Does PySpark count include null? Count of null values of dataframe in pyspark is obtained using null() Function. Count of Missing values of dataframe in pyspark is obtained using isnan() Function.

Is null and is not null in PySpark?

isNull() function is used to check if the current expression is NULL/None or column contains a NULL/None value, if it contains it returns a boolean value True. pyspark. sql.

How do you count distinct in PySpark?

In Pyspark, there are two ways to get the count of distinct values. We can use distinct() and count() functions of DataFrame to get the count distinct of PySpark DataFrame. Another way is to use SQL countDistinct() function which will provide the distinct value count of all the selected columns.

How do you replace NaN with 0 in PySpark?

In PySpark, DataFrame. fillna() or DataFrameNaFunctions. fill() is used to replace NULL/None values on all or selected multiple DataFrame columns with either zero(0), empty string, space, or any constant literal values.


1 Answers

isnan is not a method belonging to the Column class, you need to import it:

from pyspark.sql.functions import isnan

And use it like:

df.filter((df["ID"] == "") | df["ID"].isNull() | isnan(df["ID"])).count()
like image 95
Psidom Avatar answered Oct 02 '22 22:10

Psidom