i want to count NULL, empty and NaN values in a column. I tried it like this:
df.filter( (df["ID"] == "") | (df["ID"].isNull()) | ( df["ID"].isnan()) ).count()
But i always get this error message:
TypeError: 'Column' object is not callable
Does anyone have an idea what might be the problem?
Many thanks in advance!
Does PySpark count include null? Count of null values of dataframe in pyspark is obtained using null() Function. Count of Missing values of dataframe in pyspark is obtained using isnan() Function.
isNull() function is used to check if the current expression is NULL/None or column contains a NULL/None value, if it contains it returns a boolean value True. pyspark. sql.
In Pyspark, there are two ways to get the count of distinct values. We can use distinct() and count() functions of DataFrame to get the count distinct of PySpark DataFrame. Another way is to use SQL countDistinct() function which will provide the distinct value count of all the selected columns.
In PySpark, DataFrame. fillna() or DataFrameNaFunctions. fill() is used to replace NULL/None values on all or selected multiple DataFrame columns with either zero(0), empty string, space, or any constant literal values.
isnan
is not a method belonging to the Column
class, you need to import it:
from pyspark.sql.functions import isnan
And use it like:
df.filter((df["ID"] == "") | df["ID"].isNull() | isnan(df["ID"])).count()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With