In Python Pandas, what's the best way to check whether a DataFrame has one (or more) NaN values?
I know about the function pd.isnan
, but this returns a DataFrame of booleans for each element. This post right here doesn't exactly answer my question either.
NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float. NaN value is one of the major problems in Data Analysis.
In order to check null values in Pandas DataFrame, we use isnull() function this function return dataframe of Boolean values which are True for NaN values.
shape() method returns the number of rows and number of columns as a tuple, you can use this to check if pandas DataFrame is empty. DataFrame. shape[0] return number of rows. If you have no rows then it gives you 0 and comparing it with 0 gives you True .
jwilner's response is spot on. I was exploring to see if there's a faster option, since in my experience, summing flat arrays is (strangely) faster than counting. This code seems faster:
df.isnull().values.any()
import numpy as np import pandas as pd import perfplot def setup(n): df = pd.DataFrame(np.random.randn(n)) df[df > 0.9] = np.nan return df def isnull_any(df): return df.isnull().any() def isnull_values_sum(df): return df.isnull().values.sum() > 0 def isnull_sum(df): return df.isnull().sum() > 0 def isnull_values_any(df): return df.isnull().values.any() perfplot.save( "out.png", setup=setup, kernels=[isnull_any, isnull_values_sum, isnull_sum, isnull_values_any], n_range=[2 ** k for k in range(25)], )
df.isnull().sum().sum()
is a bit slower, but of course, has additional information -- the number of NaNs
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With