Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fillna in multiple columns in place in Python Pandas

I have a pandas dataFrame of mixed types, some are strings and some are numbers. I would like to replace the NAN values in string columns by '.', and the NAN values in float columns by 0.

Consider this small fictitious example:

df = pd.DataFrame({'Name':['Jack','Sue',pd.np.nan,'Bob','Alice','John'],
    'A': [1, 2.1, pd.np.nan, 4.7, 5.6, 6.8],
    'B': [.25, pd.np.nan, pd.np.nan, 4, 12.2, 14.4],
    'City':['Seattle','SF','LA','OC',pd.np.nan,pd.np.nan]})

Now, I can do it in 3 lines:

df['Name'].fillna('.',inplace=True)
df['City'].fillna('.',inplace=True)
df.fillna(0,inplace=True)

Since this is a small dataframe, 3 lines is probably ok. In my real example (which I cannot share here due to data confidentiality reasons), I have many more string columns and numeric columns. SO I end up writing many lines just for fillna. Is there a concise way of doing this?

like image 727
ozzy Avatar asked Jan 21 '16 01:01

ozzy


People also ask

How do I Fillna multiple columns?

We can use fillna() function to impute the missing values of a data frame to every column defined by a dictionary of values. The limitation of this method is that we can only use constant values to be filled.

How do you replace NaN values in multiple columns in Python?

You can also use df. replace(np. nan,0) to replace all NaN values with zero. This replaces all columns of DataFrame with zero for Nan values.

Is Fillna an inplace?

The fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.


4 Answers

Came across this page while looking for an answer to this problem, but didn't like the existing answers. I ended up finding something better in the DataFrame.fillna documentation, and figured I'd contribute for anyone else that happens upon this.

If you have multiple columns, but only want to replace the NaN in a subset of them, you can use:

df.fillna({'Name':'.', 'City':'.'}, inplace=True)

This also allows you to specify different replacements for each column. And if you want to go ahead and fill all remaining NaN values, you can just throw another fillna on the end:

df.fillna({'Name':'.', 'City':'.'}, inplace=True).fillna(0, inplace=True)

Edit (22 Apr 2021)

Functionality (presumably / apparently) changed since original post, and you can no longer chain 2 inplace fillna() operations. You can still chain, but now must assign that chain to the df instead of modifying in place, e.g. like so:

df = df.fillna({'Name':'.', 'City':'.'}).fillna(0)
like image 97
Rob Bulmahn Avatar answered Oct 17 '22 12:10

Rob Bulmahn


You could use apply for your columns with checking dtype whether it's numeric or not by checking dtype.kind:

res = df.apply(lambda x: x.fillna(0) if x.dtype.kind in 'biufc' else x.fillna('.'))

print(res)
     A      B     City   Name
0  1.0   0.25  Seattle   Jack
1  2.1   0.00       SF    Sue
2  0.0   0.00       LA      .
3  4.7   4.00       OC    Bob
4  5.6  12.20        .  Alice
5  6.8  14.40        .   John
like image 43
Anton Protopopov Avatar answered Oct 17 '22 12:10

Anton Protopopov


You can either list the string columns by hand or glean them from df.dtypes. Once you have the list of string/object columns, you can call fillna on all those columns at once.

# str_cols = ['Name','City']
str_cols = df.columns[df.dtypes==object]
df[str_cols] = df[str_cols].fillna('.')
df = df.fillna(0)
like image 25
Bob Baxley Avatar answered Oct 17 '22 13:10

Bob Baxley


define a function:

def myfillna(series):
    if series.dtype is pd.np.dtype(float):
        return series.fillna(0)
    elif series.dtype is pd.np.dtype(object):
        return series.fillna('.')
    else:
        return series

you can add other elif statements if you want to fill a column of a different dtype in some other way. Now apply this function over all columns of the dataframe

df = df.apply(myfillna)

this is the same as 'inplace'

like image 3
latorrefabian Avatar answered Oct 17 '22 13:10

latorrefabian