Pandas: Fill NaNs with next non-NaN / # consecutive NaNs

Question

I'm looking to take a pandas series and fill NaN with the average of the next numerical value where: average = next numerical value / (# consecutive NaNs + 1)

Here's my code so far, I just can't figure out how to divide the filler column among the NaNs (and the next numerical value as well) in num:

import pandas as pd

dates = pd.date_range(start = '1/1/2016',end = '1/12/2016', freq = 'D')
nums = [10, 12, None, None, 39, 10, 11, None, None, None, None, 60]

df = pd.DataFrame({
        'date':dates, 
        'num':nums
        })

df['filler'] = df['num'].fillna(method = 'bfill')

Current Output:

         date   num  filler
0  2016-01-01  10.0    10.0
1  2016-01-02  12.0    12.0
2  2016-01-03   NaN    39.0
3  2016-01-04   NaN    39.0
4  2016-01-05  39.0    39.0
5  2016-01-06  10.0    10.0
6  2016-01-07  11.0    11.0
7  2016-01-08   NaN    60.0
8  2016-01-09   NaN    60.0
9  2016-01-10   NaN    60.0
10 2016-01-11   NaN    60.0
11 2016-01-12  60.0    60.0

Desired Output:

         date   num
0  2016-01-01  10.0
1  2016-01-02  12.0
2  2016-01-03  13.0
3  2016-01-04  13.0
4  2016-01-05  13.0
5  2016-01-06  10.0
6  2016-01-07  11.0
7  2016-01-08  12.0
8  2016-01-09  12.0
9  2016-01-10  12.0
10 2016-01-11  12.0
11 2016-01-12  12.0

piRSquared · Accepted Answer

Take a reverse cumsum of notnull
Use that to groupby and transform with mean

csum = df.num.notnull()[::-1].cumsum()
filler = df.num.fillna(0).groupby(csum).transform('mean')
df.assign(filler=filler)

         date   num  filler
0  2016-01-01  10.0    10.0
1  2016-01-02  12.0    12.0
2  2016-01-03   NaN    13.0
3  2016-01-04   NaN    13.0
4  2016-01-05  39.0    13.0
5  2016-01-06  10.0    10.0
6  2016-01-07  11.0    11.0
7  2016-01-08   NaN    12.0
8  2016-01-09   NaN    12.0
9  2016-01-10   NaN    12.0
10 2016-01-11   NaN    12.0
11 2016-01-12  60.0    12.0

how it works

df.num.notnull().cumsum() is a standard technique to find groups of contiguous nulls. However, I wanted my groups to end with the next numeric value. So I reversed the series and then cumsum'd.
I want my average to include the number of nulls. Easiest way to do that is to fill with zero and take a normal mean over the groups I I just made.
transform to broadcast across the existing index
assign new column. Despite having reversed the series, the index will realign like magic. Could have used loc but that overwrites the existing df. I'll let OP decide to overwrite if they want to.

Pandas: Fill NaNs with next non-NaN / # consecutive NaNs

Tags:

python

pandas

pandas-groupby

elPastor

1 Answers

piRSquared

Recent Activity

Donate For Us

Pandas: Fill NaNs with next non-NaN / # consecutive NaNs

Tags:

python

pandas

pandas-groupby

elPastor

1 Answers

piRSquared

Related questions

Recent Activity

Donate For Us