The problem: let us take Titanic dataset from Kaggle. I have dataframe with columns "Pclass", "Sex" and "Age". I need to fill NaN in column "Age" with a median for certain group. If it is a woman from 1st class, I would like to fill her age with the median for 1st class women, not with the median for whole Age column.
The question is how to make this change in a certain slice?
I tried:
data['Age'][(data['Sex'] == 'female')&(data['Pclass'] == 1)&(data['Age'].isnull())].fillna(median)
where the "median" is my value, but nothing changes "inplace=True" didn't help.
Thanks alot!
Slicing a DataFrame in Pandas includes the following steps:Ensure Python is installed (or install ActivePython) Import a dataset. Create a DataFrame. Slice the DataFrame.
Pandas DataFrame fillna() MethodThe fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.
I believe you need filter by masks and assign back:
data = pd.DataFrame({'a':list('aaaddd'),
'Sex':['female','female','male','female','female','male'],
'Pclass':[1,2,1,2,1,1],
'Age':[40,20,30,20,np.nan,np.nan]})
print (data)
Age Pclass Sex a
0 40.0 1 female a
1 20.0 2 female a
2 30.0 1 male a
3 20.0 2 female d
4 NaN 1 female d
5 NaN 1 male d
#boolean mask
mask1 = (data['Sex'] == 'female')&(data['Pclass'] == 1)
#get median by mask without NaNs
med = data.loc[mask1, 'Age'].median()
print (med)
40.0
#repalce NaNs
data.loc[mask1, 'Age'] = data.loc[mask1, 'Age'].fillna(med)
print (data)
Age Pclass Sex a
0 40.0 1 female a
1 20.0 2 female a
2 30.0 1 male a
3 20.0 2 female d
4 40.0 1 female d
5 NaN 1 male d
What is same as:
mask2 = mask1 &(data['Age'].isnull())
data.loc[mask2, 'Age'] = med
print (data)
Age Pclass Sex a
0 40.0 1 female a
1 20.0 2 female a
2 30.0 1 male a
3 20.0 2 female d
4 40.0 1 female d
5 NaN 1 male d
EDIT:
If need replace all groups NaN
s by median:
data['Age'] = data.groupby(["Sex","Pclass"])["Age"].apply(lambda x: x.fillna(x.median()))
print (data)
Age Pclass Sex a
0 40.0 1 female a
1 20.0 2 female a
2 30.0 1 male a
3 20.0 2 female d
4 40.0 1 female d
5 30.0 1 male d
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With