Suppose the following dataframe:
df = pd.DataFrame(
{'X': ['a', 'a', 'b', 'a', 'b'],
'Y': [2, 4, 8, 10, 5]})
which looks as:
X Y
0 a 2
1 a 4
2 b 8
3 a 10
4 b 5
How to replace the first element of each group by X with the respective mean?
The expected output:
X Y
0 a 5.33
1 a 4.00
2 b 6.50
3 a 10.00
4 b 5.00
Sorry if this is a too basic question, but I am a newbie to Python (beginning its learning).
Use GroupBy.transform for averages and set only first value per group in numpy.where with mask by Series.duplicated:
df['Y'] = np.where(df.X.duplicated(),df.Y,df.groupby("X")['Y'].transform('mean'))
print (df)
X Y
0 a 5.333333
1 a 4.000000
2 b 6.500000
3 a 10.000000
4 b 5.000000
Another solution with DataFrame.loc:
df.loc[~df.X.duplicated(), 'Y'] = df.groupby("X")['Y'].transform('mean')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With