Working with census data, I want to replace NaNs in two columns ("workclass" and "native-country") with the respective modes of those two columns. I can get the modes easily:
mode = df.filter(["workclass", "native-country"]).mode()
which returns a dataframe:
workclass native-country
0 Private United-States
However,
df.filter(["workclass", "native-country"]).fillna(mode)
does not replace the NaNs in each column with anything, let alone the mode corresponding to that column. Is there a smooth way to do this?
Example 1: Filling missing columns values with fixed values: We can use fillna() function to impute the missing values of a data frame to every column defined by a dictionary of values. The limitation of this method is that we can only use constant values to be filled.
You can also use df. replace(np. nan,0) to replace all NaN values with zero. This replaces all columns of DataFrame with zero for Nan values.
If you want to impute missing values with the mode
in some columns a dataframe df
, you can just fillna
by Series
created by select by position by iloc
:
cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])
Or:
df[cols]=df[cols].fillna(mode.iloc[0])
Your solution:
df[cols]=df.filter(cols).fillna(mode.iloc[0])
Sample:
df = pd.DataFrame({'workclass':['Private','Private',np.nan, 'another', np.nan],
'native-country':['United-States',np.nan,'Canada',np.nan,'United-States'],
'col':[2,3,7,8,9]})
print (df)
col native-country workclass
0 2 United-States Private
1 3 NaN Private
2 7 Canada NaN
3 8 NaN another
4 9 United-States NaN
mode = df.filter(["workclass", "native-country"]).mode()
print (mode)
workclass native-country
0 Private United-States
cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])
print (df)
col native-country workclass
0 2 United-States Private
1 3 United-States Private
2 7 Canada Private
3 8 United-States another
4 9 United-States Private
You can do it like that:
df[["workclass", "native-country"]]=df[["workclass", "native-country"]].fillna(value=mode.iloc[0])
For example,
import pandas as pd
d={
'key3': [1,4,4,4,5],
'key2': [6,6,4],
'key1': [6,4,4],
}
df=pd.DataFrame.from_dict(d,orient='index').transpose()
Then df
is
key3 key2 key1
0 1 6 6
1 4 6 4
2 4 4 4
3 4 NaN NaN
4 5 NaN NaN
Then by doing:
l=df.filter(["key1", "key2"]).mode()
df[["key1", "key2"]]=df[["key1", "key2"]].fillna(value=l.iloc[0])
we get that df
is
key3 key2 key1
0 1 6 6
1 4 6 4
2 4 4 4
3 4 6 4
4 5 6 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With