Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find index of the first and/or last value in a column that is not NaN

I am dealing with sub-surface measurements from a borehole where each measurement type covers a different range of depths. Depth is being used as the index in this case.

I need to find the depth (index) of the first and/or last occurrence of data (non-NaN value) for each measurement type.

Getting the depth (index) of the first or last row of the dataframe is easy: df.index[0] or df.index[-1]. The trick is in finding the index of the first or last non-NaN occurrence of any given column.

df = pd.DataFrame([[500, np.NaN, np.NaN,     25],
                   [501, np.NaN, np.NaN,     27],
                   [502, np.NaN,     33,     24],
                   [503,      4,     32,     18],
                   [504,     12,     45,      5],
                   [505,      8,     38, np.NaN]])
df.columns = ['Depth','x1','x2','x3']
df.set_index('Depth')

enter image description here

The ideal solution would produce an index (depth) of 503 for the first occurrence of x1, 502 for the first occurrence of x2, and 504 for the last occurrence of x3.

like image 328
fact_finder Avatar asked Jul 31 '19 14:07

fact_finder


People also ask

How do you check if a value is not NaN in pandas?

isna() in pandas library can be used to check if the value is null/NaN. It will return True if the value is NaN/null.

What is first () in pandas?

Pandas DataFrame first() Method The first() method returns the first n rows, based on the specified value. The index have to be dates for this method to work as expected.


4 Answers

first_valid_index() and last_valid_index() can be used.

    >>> df
             x1    x2    x3
    Depth
    500     NaN   NaN  25.0
    501     NaN   NaN  27.0
    502     NaN  33.0  24.0
    503     4.0  32.0  18.0
    504    12.0  45.0   5.0
    505     8.0  38.0   NaN
    >>> df["x1"].first_valid_index()
    503
    >>> df["x2"].first_valid_index()
    502
    >>> df["x3"].first_valid_index()
    500
    >>> df["x3"].last_valid_index()
    504
like image 122
Spring Avatar answered Oct 17 '22 22:10

Spring


You can agg :

df.notna().agg({'x1':'idxmax','x2':'idxmax','x3':lambda x: x[::-1].idxmax()})
#df.notna().agg({'x1':'idxmax','x2':'idxmax','x3':lambda x: x[x].last_valid_index()})

x1    503
x2    502
x3    504

Another way would be to check if first row is nan and according to that apply the condition:

np.where(df.iloc[0].isna(),df.notna().idxmax(),df.notna()[::-1].idxmax())

[503, 502, 504]
like image 25
anky Avatar answered Oct 17 '22 21:10

anky


Let's try this, if I understand you correctly:

pd.concat([df.apply(pd.Series.first_valid_index),
           df.apply(pd.Series.last_valid_index)], 
           axis=1, 
           keys=['Min_Depth', 'Max_Depth'])

Output:

      Min_Depth   Max_Depth
x1          503         505
x2          502         505
x3          500         504

Or Transpose output:

pd.concat([df.apply(pd.Series.first_valid_index),
           df.apply(pd.Series.last_valid_index)], 
           axis=1, 
           keys=['Min_Depth', 'Max_Depth']).T

Output:

            x1   x2   x3
Min_Depth  503  502  500
Max_Depth  505  505  504

Using apply with a list of func:

df.apply([pd.Series.first_valid_index, pd.Series.last_valid_index])

Output:

                    x1   x2   x3
first_valid_index  503  502  500
last_valid_index   505  505  504

With a little renaming:

df.apply([pd.Series.first_valid_index, pd.Series.last_valid_index])\
  .set_axis(['Min_Depth', 'Max_Depth'], axis=0, inplace=False)

Output:

            x1   x2   x3
Min_Depth  503  502  500
Max_Depth  505  505  504
like image 2
Scott Boston Avatar answered Oct 17 '22 21:10

Scott Boston


IIUC

df.stack().groupby(level=1).head(1)
Out[619]: 
Depth    
500    x3    25.0
502    x2    33.0
503    x1     4.0
dtype: float64
like image 2
BENY Avatar answered Oct 17 '22 23:10

BENY