Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if any value is NaN in a Pandas DataFrame

In Python Pandas, what's the best way to check whether a DataFrame has one (or more) NaN values?

I know about the function pd.isnan, but this returns a DataFrame of booleans for each element. This post right here doesn't exactly answer my question either.

like image 758
hlin117 Avatar asked Apr 09 '15 05:04

hlin117


People also ask

Is NaN in pandas DataFrame?

NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float. NaN value is one of the major problems in Data Analysis.

How do you check if there is any null values in Python DataFrame?

In order to check null values in Pandas DataFrame, we use isnull() function this function return dataframe of Boolean values which are True for NaN values.

How do you check if a cell is empty in pandas DataFrame?

shape() method returns the number of rows and number of columns as a tuple, you can use this to check if pandas DataFrame is empty. DataFrame. shape[0] return number of rows. If you have no rows then it gives you 0 and comparing it with 0 gives you True .


1 Answers

jwilner's response is spot on. I was exploring to see if there's a faster option, since in my experience, summing flat arrays is (strangely) faster than counting. This code seems faster:

df.isnull().values.any() 

enter image description here

import numpy as np import pandas as pd import perfplot   def setup(n):     df = pd.DataFrame(np.random.randn(n))     df[df > 0.9] = np.nan     return df   def isnull_any(df):     return df.isnull().any()   def isnull_values_sum(df):     return df.isnull().values.sum() > 0   def isnull_sum(df):     return df.isnull().sum() > 0   def isnull_values_any(df):     return df.isnull().values.any()   perfplot.save(     "out.png",     setup=setup,     kernels=[isnull_any, isnull_values_sum, isnull_sum, isnull_values_any],     n_range=[2 ** k for k in range(25)], ) 

df.isnull().sum().sum() is a bit slower, but of course, has additional information -- the number of NaNs.

like image 84
S Anand Avatar answered Sep 22 '22 15:09

S Anand