I have a pandas dataFrame of mixed types, some are strings and some are numbers. I would like to replace the NAN values in string columns by '.', and the NAN values in float columns by 0.
Consider this small fictitious example:
df = pd.DataFrame({'Name':['Jack','Sue',pd.np.nan,'Bob','Alice','John'],
'A': [1, 2.1, pd.np.nan, 4.7, 5.6, 6.8],
'B': [.25, pd.np.nan, pd.np.nan, 4, 12.2, 14.4],
'City':['Seattle','SF','LA','OC',pd.np.nan,pd.np.nan]})
Now, I can do it in 3 lines:
df['Name'].fillna('.',inplace=True)
df['City'].fillna('.',inplace=True)
df.fillna(0,inplace=True)
Since this is a small dataframe, 3 lines is probably ok. In my real example (which I cannot share here due to data confidentiality reasons), I have many more string columns and numeric columns. SO I end up writing many lines just for fillna. Is there a concise way of doing this?
We can use fillna() function to impute the missing values of a data frame to every column defined by a dictionary of values. The limitation of this method is that we can only use constant values to be filled.
You can also use df. replace(np. nan,0) to replace all NaN values with zero. This replaces all columns of DataFrame with zero for Nan values.
The fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.
Came across this page while looking for an answer to this problem, but didn't like the existing answers. I ended up finding something better in the DataFrame.fillna documentation, and figured I'd contribute for anyone else that happens upon this.
If you have multiple columns, but only want to replace the NaN
in a subset of them, you can use:
df.fillna({'Name':'.', 'City':'.'}, inplace=True)
This also allows you to specify different replacements for each column. And if you want to go ahead and fill all remaining NaN
values, you can just throw another fillna
on the end:
df.fillna({'Name':'.', 'City':'.'}, inplace=True).fillna(0, inplace=True)
Edit (22 Apr 2021)
Functionality (presumably / apparently) changed since original post, and you can no longer chain 2 inplace
fillna()
operations. You can still chain, but now must assign that chain to the df
instead of modifying in place, e.g. like so:
df = df.fillna({'Name':'.', 'City':'.'}).fillna(0)
You could use apply
for your columns with checking dtype
whether it's numeric
or not by checking dtype.kind
:
res = df.apply(lambda x: x.fillna(0) if x.dtype.kind in 'biufc' else x.fillna('.'))
print(res)
A B City Name
0 1.0 0.25 Seattle Jack
1 2.1 0.00 SF Sue
2 0.0 0.00 LA .
3 4.7 4.00 OC Bob
4 5.6 12.20 . Alice
5 6.8 14.40 . John
You can either list the string columns by hand or glean them from df.dtypes
. Once you have the list of string/object columns, you can call fillna
on all those columns at once.
# str_cols = ['Name','City']
str_cols = df.columns[df.dtypes==object]
df[str_cols] = df[str_cols].fillna('.')
df = df.fillna(0)
define a function:
def myfillna(series):
if series.dtype is pd.np.dtype(float):
return series.fillna(0)
elif series.dtype is pd.np.dtype(object):
return series.fillna('.')
else:
return series
you can add other elif statements if you want to fill a column of a different dtype in some other way. Now apply this function over all columns of the dataframe
df = df.apply(myfillna)
this is the same as 'inplace'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With