I am trying to dynamically create a column name with some suffix added. I can do it in for loop but I think it will be inefficient. Is there a way to do it dynamically instead.
from pandas import Timestamp
import pandas as pd
df = pd.DataFrame({'B': range(1,6),'A':['A','A','A','B','B'],'D':[2,3,4,5,6]})
df['C'] = [Timestamp('20130101 09:00:00'),
Timestamp('20130101 09:00:02'),
Timestamp('20130102 09:00:03'),
Timestamp('20130101 09:00:05'),
Timestamp('20130101 09:00:06')]
I can do it in multiple step like :
df['D_2days']=df.groupby('A').rolling('4d',on='C')['D'].sum().values
df['B_2days']=df.groupby('A').rolling('4d',on='C')['B'].sum().values
Is there a way to do it in 1 go. I have more than 1000 columns that needs to be summed and having a loop might not be a great idea.
Thank you, Sam
We can group and calculate rolling sum on multiple columns in on go by passing the list of column names on which to calculate rolling sum, then assign the calculated rolling sum to the columns inside a dataframe after adding the required prefix
c = pd.Index(['B', 'D'])
df[c + '_2days'] = df.groupby('A', sort=False).rolling('4d',on='C')[c].sum().values
B A D C B_2days D_2days
0 1 A 2 2013-01-01 09:00:00 1.0 2.0
1 2 A 3 2013-01-01 09:00:02 3.0 5.0
2 3 A 4 2013-01-02 09:00:03 6.0 9.0
3 4 B 5 2013-01-01 09:00:05 4.0 5.0
4 5 B 6 2013-01-01 09:00:06 9.0 11.0
IIUC,
cols = ['D', 'B']
df.merge(df.groupby('A').rolling('4d',on='C')[cols].sum().add_suffix('_2days'),
right_index=True, left_on=['A', 'C'])
Output:
B A D C D_2days B_2days
0 1 A 2 2013-01-01 09:00:00 2.0 1.0
1 2 A 3 2013-01-01 09:00:02 5.0 3.0
2 3 A 4 2013-01-02 09:00:03 9.0 6.0
3 4 B 5 2013-01-01 09:00:05 5.0 4.0
4 5 B 6 2013-01-01 09:00:06 11.0 9.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With