Let's say I have this table
Type | Killed | Survived
Dog 5 2
Dog 3 4
Cat 1 7
Dog nan 3
cow nan 2
One of the value on Killed
is missing for [Type] = Dog
.
I want to impute the mean in [Killed]
for [Type] = Dog
.
My code is as follow:
df[df['Type'] == 'Dog'].mean().round()
This will give me the mean (around 2.25)
df.loc[(df['Type'] == 'Dog') & (df['Killed'])].fillna(2.25, inplace = True)
The code runs, but the value is not impute, the NaN value is still there.
My Question is, how do I impute the mean in [Killed]
based on [Type] = Dog
.
For me working:
df.ix[df['Type'] == 'Dog', 'Killed'] = df.ix[df['Type'] == 'Dog', 'Killed'].fillna(2.25)
print (df)
Type Killed Survived
0 Dog 5.00 2
1 Dog 3.00 4
2 Cat 1.00 7
3 Dog 2.25 3
4 cow NaN 2
If need fillna
by Series
- because 2 columns Killed
and Survived
:
m = df[df['Type'] == 'Dog'].mean().round()
print (m)
Killed 4.0
Survived 3.0
dtype: float64
df.ix[df['Type'] == 'Dog'] = df.ix[df['Type'] == 'Dog'].fillna(m)
print (df)
Type Killed Survived
0 Dog 5.0 2
1 Dog 3.0 4
2 Cat 1.0 7
3 Dog 4.0 3
4 cow NaN 2
If need fillna only in column Killed
:
#if dont need rounding, omit it
m = round(df.ix[df['Type'] == 'Dog', 'Killed'].mean())
print (m)
4
df.ix[df['Type'] == 'Dog', 'Killed'] = df.ix[df['Type'] == 'Dog', 'Killed'].fillna(m)
print (df)
Type Killed Survived
0 Dog 5.0 2
1 Dog 3.0 8
2 Cat 1.0 7
3 Dog 4.0 3
4 cow NaN 2
You can reuse code like:
filtered = df.ix[df['Type'] == 'Dog', 'Killed']
print (filtered)
0 5.0
1 3.0
3 NaN
Name: Killed, dtype: float64
df.ix[df['Type'] == 'Dog', 'Killed'] = filtered.fillna(filtered.mean())
print (df)
Type Killed Survived
0 Dog 5.0 2
1 Dog 3.0 8
2 Cat 1.0 7
3 Dog 4.0 3
4 cow NaN 2
groupby
with transform
df.groupby('Type').Killed.transform(lambda x: x.fillna(x.mean()))
df = pd.DataFrame([
['Dog', 5, 2],
['Dog', 3, 4],
['Cat', 1, 7],
['Dog', np.nan, 3],
['Cow', np.nan, 2]
], columns=['Type', 'Killed', 'Survived'])
df.Killed = df.groupby('Type').Killed.transform(lambda x: x.fillna(x.mean()))
df
If you meant to consider the np.nan
when calculating the mean
df.Killed = df.groupby('Type').Killed.transform(lambda x: x.fillna(x.fillna(0).mean()))
df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With