I am dealing with sub-surface measurements from a borehole where each measurement type covers a different range of depths. Depth is being used as the index in this case.
I need to find the depth (index) of the first and/or last occurrence of data (non-NaN value) for each measurement type.
Getting the depth (index) of the first or last row of the dataframe is easy: df.index[0]
or df.index[-1]
. The trick is in finding the index of the first or last non-NaN occurrence of any given column.
df = pd.DataFrame([[500, np.NaN, np.NaN, 25],
[501, np.NaN, np.NaN, 27],
[502, np.NaN, 33, 24],
[503, 4, 32, 18],
[504, 12, 45, 5],
[505, 8, 38, np.NaN]])
df.columns = ['Depth','x1','x2','x3']
df.set_index('Depth')
The ideal solution would produce an index (depth) of 503 for the first occurrence of x1, 502 for the first occurrence of x2, and 504 for the last occurrence of x3.
isna() in pandas library can be used to check if the value is null/NaN. It will return True if the value is NaN/null.
Pandas DataFrame first() Method The first() method returns the first n rows, based on the specified value. The index have to be dates for this method to work as expected.
first_valid_index() and last_valid_index() can be used.
>>> df
x1 x2 x3
Depth
500 NaN NaN 25.0
501 NaN NaN 27.0
502 NaN 33.0 24.0
503 4.0 32.0 18.0
504 12.0 45.0 5.0
505 8.0 38.0 NaN
>>> df["x1"].first_valid_index()
503
>>> df["x2"].first_valid_index()
502
>>> df["x3"].first_valid_index()
500
>>> df["x3"].last_valid_index()
504
You can agg
:
df.notna().agg({'x1':'idxmax','x2':'idxmax','x3':lambda x: x[::-1].idxmax()})
#df.notna().agg({'x1':'idxmax','x2':'idxmax','x3':lambda x: x[x].last_valid_index()})
x1 503
x2 502
x3 504
Another way would be to check if first row is nan and according to that apply the condition:
np.where(df.iloc[0].isna(),df.notna().idxmax(),df.notna()[::-1].idxmax())
[503, 502, 504]
Let's try this, if I understand you correctly:
pd.concat([df.apply(pd.Series.first_valid_index),
df.apply(pd.Series.last_valid_index)],
axis=1,
keys=['Min_Depth', 'Max_Depth'])
Output:
Min_Depth Max_Depth
x1 503 505
x2 502 505
x3 500 504
Or Transpose output:
pd.concat([df.apply(pd.Series.first_valid_index),
df.apply(pd.Series.last_valid_index)],
axis=1,
keys=['Min_Depth', 'Max_Depth']).T
Output:
x1 x2 x3
Min_Depth 503 502 500
Max_Depth 505 505 504
Using apply with a list of func:
df.apply([pd.Series.first_valid_index, pd.Series.last_valid_index])
Output:
x1 x2 x3
first_valid_index 503 502 500
last_valid_index 505 505 504
With a little renaming:
df.apply([pd.Series.first_valid_index, pd.Series.last_valid_index])\
.set_axis(['Min_Depth', 'Max_Depth'], axis=0, inplace=False)
Output:
x1 x2 x3
Min_Depth 503 502 500
Max_Depth 505 505 504
IIUC
df.stack().groupby(level=1).head(1)
Out[619]:
Depth
500 x3 25.0
502 x2 33.0
503 x1 4.0
dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With