Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: filling missing values by mean in each group

This should be straightforward, but the closest thing I've found is this post: pandas: Filling missing values within a group, and I still can't solve my problem....

Suppose I have the following dataframe

df = pd.DataFrame({'value': [1, np.nan, np.nan, 2, 3, 1, 3, np.nan, 3], 'name': ['A','A', 'B','B','B','B', 'C','C','C']})    name  value 0    A      1 1    A    NaN 2    B    NaN 3    B      2 4    B      3 5    B      1 6    C      3 7    C    NaN 8    C      3 

and I'd like to fill in "NaN" with mean value in each "name" group, i.e.

      name  value 0    A      1 1    A      1 2    B      2 3    B      2 4    B      3 5    B      1 6    C      3 7    C      3 8    C      3 

I'm not sure where to go after:

grouped = df.groupby('name').mean() 

Thanks a bunch.

like image 201
BlueFeet Avatar asked Nov 13 '13 22:11

BlueFeet


2 Answers

One way would be to use transform:

>>> df   name  value 0    A      1 1    A    NaN 2    B    NaN 3    B      2 4    B      3 5    B      1 6    C      3 7    C    NaN 8    C      3 >>> df["value"] = df.groupby("name").transform(lambda x: x.fillna(x.mean())) >>> df   name  value 0    A      1 1    A      1 2    B      2 3    B      2 4    B      3 5    B      1 6    C      3 7    C      3 8    C      3 
like image 199
DSM Avatar answered Sep 18 '22 00:09

DSM


fillna + groupby + transform + mean

This seems intuitive:

df['value'] = df['value'].fillna(df.groupby('name')['value'].transform('mean')) 

The groupby + transform syntax maps the groupwise mean to the index of the original dataframe. This is roughly equivalent to @DSM's solution, but avoids the need to define an anonymous lambda function.

like image 30
jpp Avatar answered Sep 21 '22 00:09

jpp