Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to select most recent index with finite value in column from Pandas DataFrame?

I'm trying to find the most recent index with a value that is not 'NaN' relative to the current index. So, say I have a DataFrame with 'NaN' values like this:

       A       B       C
0    2.1     5.3     4.7
1    5.1     4.6     NaN
2    5.0     NaN     NaN
3    7.4     NaN     NaN
4    3.5     NaN     NaN
5    5.2     1.0     NaN
6    5.0     6.9     5.4
7    7.4     NaN     NaN
8    3.5     NaN     5.8

If I am currently at index 4, I have the values:

       A       B       C
4    3.5     NaN     NaN

I want to know the last known value of 'B' relative to index 4, which is at index 1:

       A       B       C
1    5.1   -> 4.6    NaN

I know I can get a list of all indexes with NaN values using something like:

indexes = df.index[df['B'].apply(np.isnan)]

But this seems inefficient in a large database. Is there a way to tail just the last one relative to the current index?

like image 297
alphaleonis Avatar asked Dec 24 '22 00:12

alphaleonis


1 Answers

You may try something like this, convert the index to a series that have the same NaN values as column B and then use ffill() which carries the last non missing index forward for all subsequent NaNs:

import pandas as pd
import numpy as np
df['Last_index_notnull'] = df.index.to_series().where(df.B.notnull(), np.nan).ffill()
df['Last_value_notnull'] = df.B.ffill()
df

enter image description here

Now at index 4, you know the last non missing value is 4.6 and index is 1.

like image 156
Psidom Avatar answered Dec 26 '22 14:12

Psidom