Retrieve indices of NaN values in a pandas dataframe

Question

I try to retrieve for each row containing NaN values all the indices of the corresponding columns.

d=[[11.4,1.3,2.0, NaN],[11.4,1.3,NaN, NaN],[11.4,1.3,2.8, 0.7],[NaN,NaN,2.8, 0.7]]
df = pd.DataFrame(data=d, columns=['A','B','C','D'])
print df

      A    B    C    D
0  11.4  1.3  2.0  NaN
1  11.4  1.3  NaN  NaN
2  11.4  1.3  2.8  0.7
3  NaN   NaN  2.8  0.7

I've already done the following :

add a column with the count of NaN for each row
get the indices of each row containing NaN values

What I want (ideally the name of the column) is get a list like this :

[ ['D'],['C','D'],['A','B'] ]

Hope I can find a way without doing for each row the test for each column

if df.ix[i][column] == NaN:

I'm looking for a pandas way to be able to deal with my huge dataset.

Thanks in advance.

maxymoo · Accepted Answer

It should be efficient to use a scipy coordinate-format sparse matrix to retrieve the coordinates of the null values:

import scipy.sparse as sp

x,y = sp.coo_matrix(df.isnull()).nonzero()
print(list(zip(x,y)))

[(0, 3), (1, 2), (1, 3), (3, 0), (3, 1)]

Note that I'm calling the nonzero method in order to just output the coordinates of the nonzero entries in the underlying sparse matrix since I don't care about the actual values which are all True.

Retrieve indices of NaN values in a pandas dataframe

Tags:

python

pandas

machine-learning

dooms

1 Answers

maxymoo

Recent Activity

Donate For Us

Retrieve indices of NaN values in a pandas dataframe

Tags:

python

pandas

machine-learning

dooms

1 Answers

maxymoo

Related questions

Recent Activity

Donate For Us