Efficient way to select most recent index with finite value in column from Pandas DataFrame?

Question

I'm trying to find the most recent index with a value that is not 'NaN' relative to the current index. So, say I have a DataFrame with 'NaN' values like this:

       A       B       C
0    2.1     5.3     4.7
1    5.1     4.6     NaN
2    5.0     NaN     NaN
3    7.4     NaN     NaN
4    3.5     NaN     NaN
5    5.2     1.0     NaN
6    5.0     6.9     5.4
7    7.4     NaN     NaN
8    3.5     NaN     5.8

If I am currently at index 4, I have the values:

       A       B       C
4    3.5     NaN     NaN

I want to know the last known value of 'B' relative to index 4, which is at index 1:

       A       B       C
1    5.1   -> 4.6    NaN

I know I can get a list of all indexes with NaN values using something like:

indexes = df.index[df['B'].apply(np.isnan)]

But this seems inefficient in a large database. Is there a way to tail just the last one relative to the current index?

Psidom · Accepted Answer

You may try something like this, convert the index to a series that have the same NaN values as column B and then use ffill() which carries the last non missing index forward for all subsequent NaNs:

import pandas as pd
import numpy as np
df['Last_index_notnull'] = df.index.to_series().where(df.B.notnull(), np.nan).ffill()
df['Last_value_notnull'] = df.B.ffill()
df

enter image description here

Now at index 4, you know the last non missing value is 4.6 and index is 1.

Efficient way to select most recent index with finite value in column from Pandas DataFrame?

Tags:

python

pandas

dataframe

numpy

alphaleonis

1 Answers

Psidom

Recent Activity

Donate For Us

Efficient way to select most recent index with finite value in column from Pandas DataFrame?

Tags:

python

pandas

dataframe

numpy

alphaleonis

1 Answers

Psidom

Related questions

Recent Activity

Donate For Us