How to replace 0 values with mean based on groupby

Question

I have a dataframe with two features: gps_height (numeric) and region (categorical).

The gps_height contains a lot of 0 values, which are missing values in this case. I want to fill the 0 values with the mean of the coherent region.

My reasoning is as follows: 1. Drop the zero values and take the mean values of gps_height, grouped by region

df[df.gps_height !=0].groupby(['region']).mean()

But how do I replace the zero values in my dataframe with those mean values?

Sample data:

gps_height region 0 1390 Iringa 1 1400 Mara 2 0 Iringa 3 250 Iringa ...

jezrael · Accepted Answer

Use:

df = pd.DataFrame({'region':list('aaabbbccc'),
                   'gps_height':[2,3,0,3,4,5,1,0,0]})
print (df)
  region  gps_height
0      a           2
1      a           3
2      a           0
3      b           3
4      b           4
5      b           5
6      c           1
7      c           0
8      c           0

Replace 0 to missing values, and then replace NANs by fillna with means by GroupBy.transformper groups:

df['gps_height'] = df['gps_height'].replace(0, np.nan)
df['gps_height']=df['gps_height'].fillna(df.groupby('region')['gps_height'].transform('mean'))
print (df)
  region  gps_height
0      a         2.0
1      a         3.0
2      a         2.5
3      b         3.0
4      b         4.0
5      b         5.0
6      c         1.0
7      c         1.0
8      c         1.0

Or filter out 0 values, aggregate means and map all 0 rows:

m = df['gps_height'] != 0
s = df[m].groupby('region')['gps_height'].mean()
df.loc[~m, 'gps_height'] = df['region'].map(s)
#alternative
#df['gps_height'] = np.where(~m, df['region'].map(s), df['gps_height'])
print (df)
  region  gps_height
0      a         2.0
1      a         3.0
2      a         2.5
3      b         3.0
4      b         4.0
5      b         5.0
6      c         1.0
7      c         1.0
8      c         1.0

How to replace 0 values with mean based on groupby

Tags:

python

replace

pandas

transform

Niels Hoogeveen

1 Answers

jezrael

Recent Activity

Donate For Us

How to replace 0 values with mean based on groupby

Tags:

python

replace

pandas

transform

Niels Hoogeveen

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us