Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas find all rows where all values are NaN

Tags:

So I have a dataframe with 5 columns. I would like to pull the indices where all of the columns are NaN. I was using this code:

nan = pd.isnull(df.all) 

but that is just returning false because it is logically saying no not all values in the dataframe are null. There are thousands of entries so I would prefer to not have to loop through and check each entry. Thanks!

like image 578
Steven Setteducati Jr. Avatar asked Aug 10 '16 22:08

Steven Setteducati Jr.


2 Answers

It should just be:

df.isnull().all(1) 

The index can be accessed like:

df.index[df.isnull().all(1)] 

Demonstration

np.random.seed([3,1415]) df = pd.DataFrame(np.random.choice((1, np.nan), (10, 2))) df 

enter image description here

idx = df.index[df.isnull().all(1)] nans = df.ix[idx] nans 

enter image description here


Timing

code

np.random.seed([3,1415]) df = pd.DataFrame(np.random.choice((1, np.nan), (10000, 5))) 

enter image description here

like image 145
piRSquared Avatar answered Oct 24 '22 19:10

piRSquared


Assuming your dataframe is named df, you can use boolean indexing to check if all columns (axis=1) are null. Then take the index of the result.

np.random.seed(0) df = pd.DataFrame(np.random.randn(5, 3)) df.iloc[-2:, :] = np.nan >>> df           0         1         2 0  1.764052  0.400157  0.978738 1  2.240893  1.867558 -0.977278 2  0.950088 -0.151357 -0.103219 3       NaN       NaN       NaN 4       NaN       NaN       NaN  nan = df[df.isnull().all(axis=1)].index  >>> nan Int64Index([3, 4], dtype='int64') 
like image 34
Alexander Avatar answered Oct 24 '22 18:10

Alexander