Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find integer index of rows with NaN in pandas dataframe

Tags:

python

pandas

People also ask

Does Panda read NaN na?

This is what Pandas documentation gives: na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.

How do you find the index of a specific value in pandas?

The get_loc() function is used to find the index of any column in the Python pandas dataframe. We simply pass the column name to get_loc() function to find index.


Here is a simpler solution:

inds = pd.isnull(df).any(1).nonzero()[0]

In [9]: df
Out[9]: 
          0         1
0  0.450319  0.062595
1 -0.673058  0.156073
2 -0.871179 -0.118575
3  0.594188       NaN
4 -1.017903 -0.484744
5  0.860375  0.239265
6 -0.640070       NaN
7 -0.535802  1.632932
8  0.876523 -0.153634
9 -0.686914  0.131185

In [10]: pd.isnull(df).any(1).nonzero()[0]
Out[10]: array([3, 6])

For DataFrame df:

import numpy as np
index = df['b'].index[df['b'].apply(np.isnan)]

will give you back the MultiIndex that you can use to index back into df, e.g.:

df['a'].ix[index[0]]
>>> 1.452354

For the integer index:

df_index = df.index.values.tolist()
[df_index.index(i) for i in index]
>>> [3, 6]

One line solution. However it works for one column only.

df.loc[pandas.isna(df["b"]), :].index

And just in case, if you want to find the coordinates of 'nan' for all the columns instead (supposing they are all numericals), here you go:

df = pd.DataFrame([[0,1,3,4,np.nan,2],[3,5,6,np.nan,3,3]])

df
   0  1  2    3    4  5
0  0  1  3  4.0  NaN  2
1  3  5  6  NaN  3.0  3

np.where(np.asanyarray(np.isnan(df)))
(array([0, 1]), array([4, 3]))

Don't know if this is too late but you can use np.where to find the indices of non values as such:

indices = list(np.where(df['b'].isna()[0]))

in the case you have datetime index and you want to have the values:

df.loc[pd.isnull(df).any(1), :].index.values