Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas fillna() based on specific column attribute

Let's say I have this table

Type | Killed | Survived
Dog      5         2
Dog      3         4
Cat      1         7
Dog     nan        3
cow     nan        2

One of the value on Killed is missing for [Type] = Dog.

I want to impute the mean in [Killed] for [Type] = Dog.

My code is as follow:

  1. Search for the mean

df[df['Type'] == 'Dog'].mean().round()

This will give me the mean (around 2.25)

  1. Impute the mean (This is where the problem begins)

df.loc[(df['Type'] == 'Dog') & (df['Killed'])].fillna(2.25, inplace = True)

The code runs, but the value is not impute, the NaN value is still there.

My Question is, how do I impute the mean in [Killed] based on [Type] = Dog.

like image 588
Phurich.P Avatar asked Mar 12 '23 06:03

Phurich.P


2 Answers

For me working:

df.ix[df['Type'] == 'Dog', 'Killed'] = df.ix[df['Type'] == 'Dog', 'Killed'].fillna(2.25)
print (df)
  Type  Killed  Survived
0  Dog    5.00         2
1  Dog    3.00         4
2  Cat    1.00         7
3  Dog    2.25         3
4  cow     NaN         2

If need fillna by Series - because 2 columns Killed and Survived:

m = df[df['Type'] == 'Dog'].mean().round()
print (m)
Killed      4.0
Survived    3.0
dtype: float64

df.ix[df['Type'] == 'Dog'] = df.ix[df['Type'] == 'Dog'].fillna(m)
print (df)
  Type  Killed  Survived
0  Dog     5.0         2
1  Dog     3.0         4
2  Cat     1.0         7
3  Dog     4.0         3
4  cow     NaN         2

If need fillna only in column Killed:

#if dont need rounding, omit it
m = round(df.ix[df['Type'] == 'Dog', 'Killed'].mean())
print (m)
4

df.ix[df['Type'] == 'Dog', 'Killed'] = df.ix[df['Type'] == 'Dog', 'Killed'].fillna(m)
print (df)
  Type  Killed  Survived
0  Dog     5.0         2
1  Dog     3.0         8
2  Cat     1.0         7
3  Dog     4.0         3
4  cow     NaN         2

You can reuse code like:

filtered = df.ix[df['Type'] == 'Dog', 'Killed']
print (filtered)
0    5.0
1    3.0
3    NaN
Name: Killed, dtype: float64

df.ix[df['Type'] == 'Dog', 'Killed'] = filtered.fillna(filtered.mean())
print (df)
  Type  Killed  Survived
0  Dog     5.0         2
1  Dog     3.0         8
2  Cat     1.0         7
3  Dog     4.0         3
4  cow     NaN         2
like image 147
jezrael Avatar answered Mar 13 '23 20:03

jezrael


groupby with transform

df.groupby('Type').Killed.transform(lambda x: x.fillna(x.mean()))

Setup

df = pd.DataFrame([
        ['Dog', 5, 2],
        ['Dog', 3, 4],
        ['Cat', 1, 7],
        ['Dog', np.nan, 3],
        ['Cow', np.nan, 2]
    ], columns=['Type', 'Killed', 'Survived'])

df.Killed = df.groupby('Type').Killed.transform(lambda x: x.fillna(x.mean()))
df

enter image description here

If you meant to consider the np.nan when calculating the mean

df.Killed = df.groupby('Type').Killed.transform(lambda x: x.fillna(x.fillna(0).mean()))
df

enter image description here

like image 31
piRSquared Avatar answered Mar 13 '23 19:03

piRSquared