I am dealing with sub-surface measurements from a borehole where each measurement type covers a different range of depths. Depth is being used as the index in this case. I need to find the depth (index) of the first and/or last occurrence of data (non-NaN value) for each measurement type. Getting the depth (index) of the first or last row of the dataframe is easy: <code>df.index[0]</code> or <code>df.index[-1]</code>. The trick is in finding the index of the first or last non-NaN occurrence of any given column. <pre class="prettyprint lang-py prettyprint-override"><code>df = pd.DataFrame([[500, np.NaN, np.NaN, 25], [501, np.NaN, np.NaN, 27], [502, np.NaN, 33, 24], [503, 4, 32, 18], [504, 12, 45, 5], [505, 8, 38, np.NaN]]) df.columns = ['Depth','x1','x2','x3'] df.set_index('Depth') </code></pre> <img src="https://i.stack.imgur.com/Y7taF.png" alt="enter image description here"> The ideal solution would produce an index (depth) of 503 for the first occurrence of x1, 502 for the first occurrence of x2, and 504 for the last occurrence of x3.

first_valid_index() and last_valid_index() can be used. <pre class="prettyprint"><code> >>> df x1 x2 x3 Depth 500 NaN NaN 25.0 501 NaN NaN 27.0 502 NaN 33.0 24.0 503 4.0 32.0 18.0 504 12.0 45.0 5.0 505 8.0 38.0 NaN >>> df["x1"].first_valid_index() 503 >>> df["x2"].first_valid_index() 502 >>> df["x3"].first_valid_index() 500 >>> df["x3"].last_valid_index() 504 </code></pre>

You can <code>agg</code> : <pre class="prettyprint"><code>df.notna().agg({'x1':'idxmax','x2':'idxmax','x3':lambda x: x[::-1].idxmax()}) #df.notna().agg({'x1':'idxmax','x2':'idxmax','x3':lambda x: x[x].last_valid_index()}) </code></pre> <hr> <pre class="prettyprint"><code>x1 503 x2 502 x3 504 </code></pre> Another way would be to check if first row is nan and according to that apply the condition: <pre class="prettyprint"><code>np.where(df.iloc[0].isna(),df.notna().idxmax(),df.notna()[::-1].idxmax()) </code></pre> <hr> <pre class="prettyprint"><code>[503, 502, 504] </code></pre>

Let's try this, if I understand you correctly: <pre class="prettyprint"><code>pd.concat([df.apply(pd.Series.first_valid_index), df.apply(pd.Series.last_valid_index)], axis=1, keys=['Min_Depth', 'Max_Depth']) </code></pre> Output: <pre class="prettyprint"><code> Min_Depth Max_Depth x1 503 505 x2 502 505 x3 500 504 </code></pre> Or Transpose output: <pre class="prettyprint"><code>pd.concat([df.apply(pd.Series.first_valid_index), df.apply(pd.Series.last_valid_index)], axis=1, keys=['Min_Depth', 'Max_Depth']).T </code></pre> Output: <pre class="prettyprint"><code> x1 x2 x3 Min_Depth 503 502 500 Max_Depth 505 505 504 </code></pre> <hr> Using apply with a list of func: <pre class="prettyprint"><code>df.apply([pd.Series.first_valid_index, pd.Series.last_valid_index]) </code></pre> Output: <pre class="prettyprint"><code> x1 x2 x3 first_valid_index 503 502 500 last_valid_index 505 505 504 </code></pre> With a little renaming: <pre class="prettyprint"><code>df.apply([pd.Series.first_valid_index, pd.Series.last_valid_index])\ .set_axis(['Min_Depth', 'Max_Depth'], axis=0, inplace=False) </code></pre> Output: <pre class="prettyprint"><code> x1 x2 x3 Min_Depth 503 502 500 Max_Depth 505 505 504 </code></pre>

Find index of the first and/or last value in a column that is not NaN

Tags:

python

pandas

dataframe

numpy

I am dealing with sub-surface measurements from a borehole where each measurement type covers a different range of depths. Depth is being used as the index in this case.

I need to find the depth (index) of the first and/or last occurrence of data (non-NaN value) for each measurement type.

Getting the depth (index) of the first or last row of the dataframe is easy: df.index[0] or df.index[-1]. The trick is in finding the index of the first or last non-NaN occurrence of any given column.

df = pd.DataFrame([[500, np.NaN, np.NaN,     25],
                   [501, np.NaN, np.NaN,     27],
                   [502, np.NaN,     33,     24],
                   [503,      4,     32,     18],
                   [504,     12,     45,      5],
                   [505,      8,     38, np.NaN]])
df.columns = ['Depth','x1','x2','x3']
df.set_index('Depth')

enter image description here

The ideal solution would produce an index (depth) of 503 for the first occurrence of x1, 502 for the first occurrence of x2, and 504 for the last occurrence of x3.

328

asked Jul 31 '19 14:07

fact_finder

4 Answers

first_valid_index() and last_valid_index() can be used.

    >>> df
             x1    x2    x3
    Depth
    500     NaN   NaN  25.0
    501     NaN   NaN  27.0
    502     NaN  33.0  24.0
    503     4.0  32.0  18.0
    504    12.0  45.0   5.0
    505     8.0  38.0   NaN
    >>> df["x1"].first_valid_index()
    503
    >>> df["x2"].first_valid_index()
    502
    >>> df["x3"].first_valid_index()
    500
    >>> df["x3"].last_valid_index()
    504

122

answered Oct 17 '22 22:10

Spring

You can agg :

df.notna().agg({'x1':'idxmax','x2':'idxmax','x3':lambda x: x[::-1].idxmax()})
#df.notna().agg({'x1':'idxmax','x2':'idxmax','x3':lambda x: x[x].last_valid_index()})

x1    503
x2    502
x3    504

Another way would be to check if first row is nan and according to that apply the condition:

np.where(df.iloc[0].isna(),df.notna().idxmax(),df.notna()[::-1].idxmax())

[503, 502, 504]

answered Oct 17 '22 21:10

anky

Let's try this, if I understand you correctly:

pd.concat([df.apply(pd.Series.first_valid_index),
           df.apply(pd.Series.last_valid_index)], 
           axis=1, 
           keys=['Min_Depth', 'Max_Depth'])

Output:

      Min_Depth   Max_Depth
x1          503         505
x2          502         505
x3          500         504

Or Transpose output:

pd.concat([df.apply(pd.Series.first_valid_index),
           df.apply(pd.Series.last_valid_index)], 
           axis=1, 
           keys=['Min_Depth', 'Max_Depth']).T

Output:

            x1   x2   x3
Min_Depth  503  502  500
Max_Depth  505  505  504

Using apply with a list of func:

df.apply([pd.Series.first_valid_index, pd.Series.last_valid_index])

Output:

                    x1   x2   x3
first_valid_index  503  502  500
last_valid_index   505  505  504

With a little renaming:

df.apply([pd.Series.first_valid_index, pd.Series.last_valid_index])\
  .set_axis(['Min_Depth', 'Max_Depth'], axis=0, inplace=False)

Output:

            x1   x2   x3
Min_Depth  503  502  500
Max_Depth  505  505  504

answered Oct 17 '22 21:10

Scott Boston

IIUC

df.stack().groupby(level=1).head(1)
Out[619]: 
Depth    
500    x3    25.0
502    x2    33.0
503    x1     4.0
dtype: float64

answered Oct 17 '22 23:10

BENY

Related questions
                            
                                Rename the less frequent categories by "OTHER" python
                            
                                Python error when building Python package Docker Image
                            
                                Percentage of array between values
                            
                                AttributeError: 'int' object has no attribute 'lower' in TFIDF and CountVectorizer
                            
                                Parallel loading of Input Files in Pandas Dataframe
                            
                                How to execute file.py on HTML button press using Django?
                            
                                sort Persian strings for python [duplicate]
                            
                                convert Dataframe to 2d Array
                            
                                More efficient method of finding minimum sum after k operations
                            
                                How To Call Postgres 11 Stored Procedure From Python
                            
                                Could not find a version that satisfies the requirement flask (from versions: ) No matching distribution found for flask
                            
                                Sum only numeric columns in pandas
                            
                                What is the process "python3 unattended upgrade shutdown"?
                            
                                Storing OAuth Token in Python Library
                            
                                Is it possible to sort a list with reduce?
                            
                                `try ... except not` construction
                            
                                COCO api evaluation for subset of classes
                            
                                Sum column based on another column in Pandas DataFrame
                            
                                compute maximum f1 score using precision_recall_curve?
                            
                                AttributeError when using callback Tensorboard on Keras: 'Model' object has no attribute 'run_eagerly'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find index of the first and/or last value in a column that is not NaN

Tags:

python

pandas

dataframe

numpy

fact_finder

People also ask

4 Answers

Spring

anky

Scott Boston

BENY

Recent Activity

Donate For Us