Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find which columns contain any NaN value in Pandas dataframe

Given a pandas dataframe containing possible NaN values scattered here and there:

Question: How do I determine which columns contain NaN values? In particular, can I get a list of the column names containing NaNs?

like image 654
jesperk.eth Avatar asked Mar 25 '16 18:03

jesperk.eth


People also ask

How do you check if a column has NaN values?

To do this we can use the statement df. isna(). any() . This will check all of our columns and return True if there are any missing values or NaN s, or False if there are no missing values.

How do you check if any column has null value in pandas?

In order to check null values in Pandas Dataframe, we use notnull() function this function return dataframe of Boolean values which are False for NaN values.

How do you check if pandas series has NaN values?

isna() in pandas library can be used to check if the value is null/NaN. It will return True if the value is NaN/null.


1 Answers

UPDATE: using Pandas 0.22.0

Newer Pandas versions have new methods 'DataFrame.isna()' and 'DataFrame.notna()'

In [71]: df Out[71]:      a    b  c 0  NaN  7.0  0 1  0.0  NaN  4 2  2.0  NaN  4 3  1.0  7.0  0 4  1.0  3.0  9 5  7.0  4.0  9 6  2.0  6.0  9 7  9.0  6.0  4 8  3.0  0.0  9 9  9.0  0.0  1  In [72]: df.isna().any() Out[72]: a     True b     True c    False dtype: bool 

as list of columns:

In [74]: df.columns[df.isna().any()].tolist() Out[74]: ['a', 'b'] 

to select those columns (containing at least one NaN value):

In [73]: df.loc[:, df.isna().any()] Out[73]:      a    b 0  NaN  7.0 1  0.0  NaN 2  2.0  NaN 3  1.0  7.0 4  1.0  3.0 5  7.0  4.0 6  2.0  6.0 7  9.0  6.0 8  3.0  0.0 9  9.0  0.0 

OLD answer:

Try to use isnull():

In [97]: df Out[97]:      a    b  c 0  NaN  7.0  0 1  0.0  NaN  4 2  2.0  NaN  4 3  1.0  7.0  0 4  1.0  3.0  9 5  7.0  4.0  9 6  2.0  6.0  9 7  9.0  6.0  4 8  3.0  0.0  9 9  9.0  0.0  1  In [98]: pd.isnull(df).sum() > 0 Out[98]: a     True b     True c    False dtype: bool 

or as @root proposed clearer version:

In [5]: df.isnull().any() Out[5]: a     True b     True c    False dtype: bool  In [7]: df.columns[df.isnull().any()].tolist() Out[7]: ['a', 'b'] 

to select a subset - all columns containing at least one NaN value:

In [31]: df.loc[:, df.isnull().any()] Out[31]:      a    b 0  NaN  7.0 1  0.0  NaN 2  2.0  NaN 3  1.0  7.0 4  1.0  3.0 5  7.0  4.0 6  2.0  6.0 7  9.0  6.0 8  3.0  0.0 9  9.0  0.0 
like image 182
MaxU - stop WAR against UA Avatar answered Sep 28 '22 23:09

MaxU - stop WAR against UA