Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace a column values with its mean of groups in dataframe

I have a DataFrame as

Page    Line    y
1        2      3.2
1        2      6.1
1        3      7.1
2        4      8.5
2        4      9.1

I have to replace column y with values of its mean in groups. I can do that grouping using one column using this code.

df['y'] = df['y'].groupby(df['Page'], group_keys=False).transform('mean') 

I am trying to replace the values of y by mean of groups by 'Page' and 'Line'. Something like this,

Page    Line    y
1        2      4.65
1        2      4.65
1        3      7.1
2        4      8.8
2        4      8.8

I have searched through a lot of answers on this site but couldn't find this application. Using python3 with pandas.

like image 687
Akash Kumar Avatar asked Dec 13 '22 15:12

Akash Kumar


2 Answers

You need list of columns names, groupby parameter by:

by : mapping, function, label, or list of labels

Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If an ndarray is passed, the values are used as-is determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted a (single) key.

df['y'] = df.groupby(['Page', 'Line'])['y'].transform('mean') 
print (df)
   Page  Line     y
0     1     2  4.65
1     1     2  4.65
2     1     3  7.10
3     2     4  8.80
4     2     4  8.80

Your solution should be changed to this syntactic sugar - pass Series in list:

df['y'] = df['y'].groupby([df['Page'], df['Line']]).transform('mean') 
like image 191
jezrael Avatar answered Feb 28 '23 05:02

jezrael


So you want this:

df['y'] = df.groupby(['Page', 'Line']).transform('mean')
like image 36
zipa Avatar answered Feb 28 '23 05:02

zipa