Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Fillna of Multiple Columns with Mode of Each Column

Working with census data, I want to replace NaNs in two columns ("workclass" and "native-country") with the respective modes of those two columns. I can get the modes easily:

mode = df.filter(["workclass", "native-country"]).mode()

which returns a dataframe:

  workclass native-country
0   Private  United-States

However,

df.filter(["workclass", "native-country"]).fillna(mode)

does not replace the NaNs in each column with anything, let alone the mode corresponding to that column. Is there a smooth way to do this?

like image 887
Nick Avatar asked Mar 18 '17 04:03

Nick


People also ask

How do I use Fillna for multiple columns in pandas?

Example 1: Filling missing columns values with fixed values: We can use fillna() function to impute the missing values of a data frame to every column defined by a dictionary of values. The limitation of this method is that we can only use constant values to be filled.

How do I change NaN values for multiple columns?

You can also use df. replace(np. nan,0) to replace all NaN values with zero. This replaces all columns of DataFrame with zero for Nan values.


2 Answers

If you want to impute missing values with the mode in some columns a dataframe df, you can just fillna by Series created by select by position by iloc:

cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])

Or:

df[cols]=df[cols].fillna(mode.iloc[0])

Your solution:

df[cols]=df.filter(cols).fillna(mode.iloc[0])

Sample:

df = pd.DataFrame({'workclass':['Private','Private',np.nan, 'another', np.nan],
                   'native-country':['United-States',np.nan,'Canada',np.nan,'United-States'],
                   'col':[2,3,7,8,9]})

print (df)
   col native-country workclass
0    2  United-States   Private
1    3            NaN   Private
2    7         Canada       NaN
3    8            NaN   another
4    9  United-States       NaN

mode = df.filter(["workclass", "native-country"]).mode()
print (mode)
  workclass native-country
0   Private  United-States

cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])
print (df)
   col native-country workclass
0    2  United-States   Private
1    3  United-States   Private
2    7         Canada   Private
3    8  United-States   another
4    9  United-States   Private
like image 162
jezrael Avatar answered Sep 24 '22 03:09

jezrael


You can do it like that:

df[["workclass", "native-country"]]=df[["workclass", "native-country"]].fillna(value=mode.iloc[0])

For example,

    import pandas as pd
d={
    'key3': [1,4,4,4,5],
    'key2': [6,6,4],
    'key1': [6,4,4],
}

df=pd.DataFrame.from_dict(d,orient='index').transpose()

Then df is

  key3  key2    key1
0   1   6       6
1   4   6       4
2   4   4       4
3   4   NaN     NaN
4   5   NaN     NaN

Then by doing:

l=df.filter(["key1", "key2"]).mode()
df[["key1", "key2"]]=df[["key1", "key2"]].fillna(value=l.iloc[0])

we get that df is

  key3  key2    key1
0   1   6        6
1   4   6        4
2   4   4        4
3   4   6        4
4   5   6        4
like image 21
Miriam Farber Avatar answered Sep 20 '22 03:09

Miriam Farber