Working with census data, I want to replace NaNs in two columns ("workclass" and "native-country") with the respective modes of those two columns. I can get the modes easily: <pre class="prettyprint"><code>mode = df.filter(["workclass", "native-country"]).mode() </code></pre> which returns a dataframe: <pre class="prettyprint"><code> workclass native-country 0 Private United-States </code></pre> However, <pre class="prettyprint"><code>df.filter(["workclass", "native-country"]).fillna(mode) </code></pre> does not replace the NaNs in each column with anything, let alone the mode corresponding to that column. Is there a smooth way to do this?

You can do it like that: <pre class="prettyprint"><code>df[["workclass", "native-country"]]=df[["workclass", "native-country"]].fillna(value=mode.iloc[0]) </code></pre> For example, <pre class="prettyprint"><code> import pandas as pd d={ 'key3': [1,4,4,4,5], 'key2': [6,6,4], 'key1': [6,4,4], } df=pd.DataFrame.from_dict(d,orient='index').transpose() </code></pre> Then <code>df</code> is <pre class="prettyprint"><code> key3 key2 key1 0 1 6 6 1 4 6 4 2 4 4 4 3 4 NaN NaN 4 5 NaN NaN </code></pre> Then by doing: <pre class="prettyprint"><code>l=df.filter(["key1", "key2"]).mode() df[["key1", "key2"]]=df[["key1", "key2"]].fillna(value=l.iloc[0]) </code></pre> we get that <code>df</code> is <pre class="prettyprint"><code> key3 key2 key1 0 1 6 6 1 4 6 4 2 4 4 4 3 4 6 4 4 5 6 4 </code></pre>

Pandas Fillna of Multiple Columns with Mode of Each Column

Tags:

python

pandas

numpy

data-science

Working with census data, I want to replace NaNs in two columns ("workclass" and "native-country") with the respective modes of those two columns. I can get the modes easily:

mode = df.filter(["workclass", "native-country"]).mode()

which returns a dataframe:

  workclass native-country
0   Private  United-States

However,

df.filter(["workclass", "native-country"]).fillna(mode)

does not replace the NaNs in each column with anything, let alone the mode corresponding to that column. Is there a smooth way to do this?

887

asked Mar 18 '17 04:03

Nick

2 Answers

If you want to impute missing values with the mode in some columns a dataframe df, you can just fillna by Series created by select by position by iloc:

cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])

Or:

df[cols]=df[cols].fillna(mode.iloc[0])

Your solution:

df[cols]=df.filter(cols).fillna(mode.iloc[0])

Sample:

df = pd.DataFrame({'workclass':['Private','Private',np.nan, 'another', np.nan],
                   'native-country':['United-States',np.nan,'Canada',np.nan,'United-States'],
                   'col':[2,3,7,8,9]})

print (df)
   col native-country workclass
0    2  United-States   Private
1    3            NaN   Private
2    7         Canada       NaN
3    8            NaN   another
4    9  United-States       NaN

mode = df.filter(["workclass", "native-country"]).mode()
print (mode)
  workclass native-country
0   Private  United-States

cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])
print (df)
   col native-country workclass
0    2  United-States   Private
1    3  United-States   Private
2    7         Canada   Private
3    8  United-States   another
4    9  United-States   Private

162

answered Sep 24 '22 03:09

jezrael

You can do it like that:

df[["workclass", "native-country"]]=df[["workclass", "native-country"]].fillna(value=mode.iloc[0])

For example,

    import pandas as pd
d={
    'key3': [1,4,4,4,5],
    'key2': [6,6,4],
    'key1': [6,4,4],
}

df=pd.DataFrame.from_dict(d,orient='index').transpose()

Then df is

  key3  key2    key1
0   1   6       6
1   4   6       4
2   4   4       4
3   4   NaN     NaN
4   5   NaN     NaN

Then by doing:

l=df.filter(["key1", "key2"]).mode()
df[["key1", "key2"]]=df[["key1", "key2"]].fillna(value=l.iloc[0])

we get that df is

  key3  key2    key1
0   1   6        6
1   4   6        4
2   4   4        4
3   4   6        4
4   5   6        4

answered Sep 20 '22 03:09

Miriam Farber

Related questions
                            
                                Is the continue statement necessary in a Python while loop?
                            
                                Copying a list using a[:] or copy() in python is shallow? [duplicate]
                            
                                Error in Spark while declaring a UDF
                            
                                Plot confusion matrix sklearn with multiple labels
                            
                                How to divide the sum with the size in a pandas groupby
                            
                                Python- Is there a function or formula to find the complementary colour of a rgb code?
                            
                                Imported Enum class is not comparing equal to itself
                            
                                Can we return after raise statement
                            
                                How to Transpose each element in a 3D np array
                            
                                How to delete a django JWT token?
                            
                                Load npy file from S3 in python
                            
                                pyqt4 window resize event
                            
                                h5py, access data in Datasets in SVHN
                            
                                Splitting a 2 dimensional array or a list into two 1 dimensional lists in python [duplicate]
                            
                                Error installing pydns
                            
                                How to swap a group of column headings with their values in Pandas
                            
                                numpy.sum() giving strange results on large arrays
                            
                                Qt has no attribute 'AlignCenter' [duplicate]
                            
                                Accuracy difference on normalization in KNN
                            
                                ImportError: No module named 'tasks'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With