Using pandas I like to use groupby and an aggregate function, e.g. mean and then put the results back in the original dataframe, but in the next group and not in the group itself. How to do this in a vectorized way?
I have a pandas dataframe like this:
data = {'Group': ['A','A','B','B','B','B', 'C','C', 'D','D'],
'Value': [1.1,1.3,9.1,9.2,9.5,9.4,6.2,6.4,2.2,2.3]
}
df = pd.DataFrame(data, columns = ['Group','Value'])
print (df)
Group Value
0 A 1.1
1 A 1.3
2 B 9.1
3 B 9.2
4 B 9.5
5 B 9.4
6 C 6.2
7 C 6.4
8 D 2.2
9 D 2.3
I like to get this, where each group has the mean value of the previous group.
Group Value
0 A NaN
1 A NaN
2 B 1.2
3 B 1.2
4 B 1.2
5 B 1.2
6 C 9.3
7 C 9.3
8 D 6.3
9 D 6.3
I tried this, but this is without the shift to the next group
df.groupby('Group')['Value'].transform('mean')
Easy, use map
on a groupby result:
df['Value'] = df['Group'].map(df.groupby('Group')['Value'].mean().shift())
df
Group Value
0 A NaN
1 A NaN
2 B 1.2
3 B 1.2
4 B 1.2
5 B 1.2
6 C 9.3
7 C 9.3
8 D 6.3
9 D 6.3
How It Works
Get the mean
df.groupby('Group')['Value'].mean()
Group
A 1.20
B 9.30
C 6.30
D 2.25
Name: Value, dtype: float64
Shift it down by 1
df.groupby('Group')['Value'].mean().shift()
Group
A NaN
B 1.2
C 9.3
D 6.3
Name: Value, dtype: float64
Map it back.
df['Group'].map(df.groupby('Group')['Value'].mean().shift())
0 NaN
1 NaN
2 1.2
3 1.2
4 1.2
5 1.2
6 9.3
7 9.3
8 6.3
9 6.3
Name: Group, dtype: float64
You can calculate aggregated GroupBy.mean
of each group value and use pd.Series.shift
and take advantage of pandas index alignment.
df.set_index('Group').assign(value = df.groupby('Group').mean().shift()).reset_index()
Group Value value
0 A 1.1 NaN
1 A 1.3 NaN
2 B 9.1 1.2
3 B 9.2 1.2
4 B 9.5 1.2
5 B 9.4 1.2
6 C 6.2 9.3
7 C 6.4 9.3
8 D 2.2 6.3
9 D 2.3 6.3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With