Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find index of the first NaN value in the row

I have a dataframe that looks like that:

table = pd.DataFrame({'a':[0,0,0,0],
                      'b':[1,1,1,3,],
                      'c':[2,2,5,4],
                      'd':[3,np.NaN,6,6],
                      'e':[4,np.NaN, 7,8],
                      'f':[np.NaN,np.NaN,np.NaN,10,]}, dtype='float64')


    a   b   c   d   e   f
0   0.0 1.0 2.0 3.0 4.0 NaN
1   0.0 1.0 2.0 NaN NaN NaN
2   0.0 1.0 5.0 6.0 7.0 NaN
3   0.0 3.0 4.0 6.0 8.0 10.0

For each row, I'm trying to find the index of the column for the first NaN value. So that I can store that value in a variable to use it later.

So far, I tried this piece of code but it's not giving me exactly what I want.. I don't want an array, just a value.

for i in table.itertuples():
    x = np.where(np.isnan(i))
    print(x)

(array([6]),)
(array([4, 5, 6]),)
(array([6]),)
(array([], dtype=int64),)

Thanks in advance for any comment/advice !

like image 986
Florian Bernard Avatar asked Mar 06 '23 01:03

Florian Bernard


2 Answers

Check na, get the index of max value by row and screen out rows that don't have na at all.

table.isna().idxmax(1).where(table.isna().any(1))

#0      f
#1      d
#2      f
#3    NaN
#dtype: object

Or if you need the column indices, as commented by @hpaulj, you can use argmax:

import numpy as np
is_missing = table.isna().values
np.where(is_missing.any(1), is_missing.argmax(1), np.nan)

# array([ 5.,  3.,  5., nan])
like image 76
Psidom Avatar answered Mar 15 '23 12:03

Psidom


Use:

t = np.isnan(table.values).argmax(axis=1)
print (t)
[5 3 5 0]

But if need add one value for non NaNs rows:

t = np.isnan(table.reset_index().values).argmax(axis=1)
print (t)
[6 4 6 0]
like image 28
jezrael Avatar answered Mar 15 '23 13:03

jezrael