Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In pandas, how to assign the result of a groupby aggregate to the next group in the original df?

Tags:

python

pandas

Using pandas I like to use groupby and an aggregate function, e.g. mean and then put the results back in the original dataframe, but in the next group and not in the group itself. How to do this in a vectorized way?

I have a pandas dataframe like this:

data = {'Group': ['A','A','B','B','B','B', 'C','C', 'D','D'],
        'Value': [1.1,1.3,9.1,9.2,9.5,9.4,6.2,6.4,2.2,2.3]
        }

df = pd.DataFrame(data, columns = ['Group','Value'])

print (df)

  Group  Value
0     A    1.1
1     A    1.3
2     B    9.1
3     B    9.2
4     B    9.5
5     B    9.4
6     C    6.2
7     C    6.4
8     D    2.2
9     D    2.3

I like to get this, where each group has the mean value of the previous group.

  Group  Value
0     A    NaN
1     A    NaN
2     B    1.2
3     B    1.2
4     B    1.2
5     B    1.2
6     C    9.3
7     C    9.3
8     D    6.3
9     D    6.3

I tried this, but this is without the shift to the next group

df.groupby('Group')['Value'].transform('mean')
like image 955
rudi2013 Avatar asked Dec 31 '22 00:12

rudi2013


2 Answers

Easy, use map on a groupby result:

df['Value'] = df['Group'].map(df.groupby('Group')['Value'].mean().shift()) 
df
  Group  Value
0     A    NaN
1     A    NaN
2     B    1.2
3     B    1.2
4     B    1.2
5     B    1.2
6     C    9.3
7     C    9.3
8     D    6.3
9     D    6.3

How It Works

Get the mean

df.groupby('Group')['Value'].mean()

Group
A    1.20
B    9.30
C    6.30
D    2.25
Name: Value, dtype: float64

Shift it down by 1

df.groupby('Group')['Value'].mean().shift() 

Group
A    NaN
B    1.2
C    9.3
D    6.3
Name: Value, dtype: float64

Map it back.

df['Group'].map(df.groupby('Group')['Value'].mean().shift())  

0    NaN
1    NaN
2    1.2
3    1.2
4    1.2
5    1.2
6    9.3
7    9.3
8    6.3
9    6.3
Name: Group, dtype: float64
like image 68
cs95 Avatar answered May 22 '23 06:05

cs95


You can calculate aggregated GroupBy.mean of each group value and use pd.Series.shift and take advantage of pandas index alignment.

df.set_index('Group').assign(value = df.groupby('Group').mean().shift()).reset_index()

  Group  Value  value
0     A    1.1    NaN
1     A    1.3    NaN
2     B    9.1    1.2
3     B    9.2    1.2
4     B    9.5    1.2
5     B    9.4    1.2
6     C    6.2    9.3
7     C    6.4    9.3
8     D    2.2    6.3
9     D    2.3    6.3
like image 20
Ch3steR Avatar answered May 22 '23 05:05

Ch3steR