A B C
0 1 10 2
1 1 15 2
2 1 14 2
3 2 11 4
4 2 12 4
5 2 13 4
6 2 16 4
7 1 18 2
This is my sample DataFrame.
I want to apply groupby on column 'A',
Apply rolling sum on column 'B' based on the value of column 'C', means when A is 1 so window size should be 2 and instead of NaN I want the sum of remaining values regardless of window size.
Currently my output is:
A
1 0 25.0
1 29.0
2 32.0
7 NaN
2 3 23.0
4 25.0
5 29.0
6 NaN
code for above:
df['B'].groupby(df['A']).rolling(df['C'][0]).sum().shift(-1)
when C = 4 , I want the window of rolling to be 4 and dont want NaN
The desired output should be as follows:
A B C Rolling_sum
0 1 10 2 25
1 1 15 2 29
2 1 14 2 32
7 1 18 2 18
3 2 11 4 52
4 2 12 4 41
5 2 13 4 29
6 2 16 4 16
Because you want pass dynamic window by column C
use lambda function with change order by iloc[::-1]
:
df = df.sort_values('A')
df['Rolling_sum'] = (df.iloc[::-1].groupby('A')
.apply(lambda x: x.B.rolling(x.C.iat[0], min_periods=0).sum())
.reset_index(level=0, drop=True))
print (df)
A B C Rolling_sum
0 1 10 2 25.0
1 1 15 2 29.0
2 1 14 2 32.0
7 1 18 2 18.0
3 2 11 4 52.0
4 2 12 4 41.0
5 2 13 4 29.0
6 2 16 4 16.0
Solution with strides if performance is important (depends of number of groups, size of groups, the best test in real data):
def rolling_window(a, window):
a = np.concatenate([[0] * (window - 1), a])
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides).sum(axis=1)
df = df.sort_values('A')
df['Rolling_sum'] = (df.iloc[::-1].groupby('A')
.apply(lambda x: pd.Series(rolling_window(x.B, x.C.iat[0]),
index=x.index))
.reset_index(level=0, drop=True))
print (df)
A B C Rolling_sum
0 1 10 2 25
1 1 15 2 29
2 1 14 2 32
7 1 18 2 18
3 2 11 4 52
4 2 12 4 41
5 2 13 4 29
6 2 16 4 16
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With