I have a pandas dataFrame of mixed types, some are strings and some are numbers. I would like to replace the NAN values in string columns by '.', and the NAN values in float columns by 0. Consider this small fictitious example: <pre class="prettyprint"><code>df = pd.DataFrame({'Name':['Jack','Sue',pd.np.nan,'Bob','Alice','John'], 'A': [1, 2.1, pd.np.nan, 4.7, 5.6, 6.8], 'B': [.25, pd.np.nan, pd.np.nan, 4, 12.2, 14.4], 'City':['Seattle','SF','LA','OC',pd.np.nan,pd.np.nan]}) </code></pre> Now, I can do it in 3 lines: <pre class="prettyprint"><code>df['Name'].fillna('.',inplace=True) df['City'].fillna('.',inplace=True) df.fillna(0,inplace=True) </code></pre> Since this is a small dataframe, 3 lines is probably ok. In my real example (which I cannot share here due to data confidentiality reasons), I have many more string columns and numeric columns. SO I end up writing many lines just for fillna. Is there a concise way of doing this?

Came across this page while looking for an answer to this problem, but didn't like the existing answers. I ended up finding something better in the DataFrame.fillna documentation, and figured I'd contribute for anyone else that happens upon this. If you have multiple columns, but only want to replace the <code>NaN</code> in a subset of them, you can use: <pre class="prettyprint"><code>df.fillna({'Name':'.', 'City':'.'}, inplace=True) </code></pre> This also allows you to specify different replacements for each column. And if you want to go ahead and fill all remaining <code>NaN</code> values, you can just throw another <code>fillna</code> on the end: <pre class="prettyprint"><code>df.fillna({'Name':'.', 'City':'.'}, inplace=True).fillna(0, inplace=True) </code></pre> <hr> Edit (22 Apr 2021) Functionality (presumably / apparently) changed since original post, and you can no longer chain 2 <code>inplace</code> <code>fillna()</code> operations. You can still chain, but now must assign that chain to the <code>df</code> instead of modifying in place, e.g. like so: <pre class="prettyprint"><code>df = df.fillna({'Name':'.', 'City':'.'}).fillna(0) </code></pre>

You could use <code>apply</code> for your columns with checking <code>dtype</code> whether it's <code>numeric</code> or not by checking <code>dtype.kind</code>: <pre class="prettyprint"><code>res = df.apply(lambda x: x.fillna(0) if x.dtype.kind in 'biufc' else x.fillna('.')) print(res) A B City Name 0 1.0 0.25 Seattle Jack 1 2.1 0.00 SF Sue 2 0.0 0.00 LA . 3 4.7 4.00 OC Bob 4 5.6 12.20 . Alice 5 6.8 14.40 . John </code></pre>

You can either list the string columns by hand or glean them from <code>df.dtypes</code>. Once you have the list of string/object columns, you can call <code>fillna</code> on all those columns at once. <pre class="prettyprint"><code># str_cols = ['Name','City'] str_cols = df.columns[df.dtypes==object] df[str_cols] = df[str_cols].fillna('.') df = df.fillna(0) </code></pre>

define a function: <pre class="prettyprint"><code>def myfillna(series): if series.dtype is pd.np.dtype(float): return series.fillna(0) elif series.dtype is pd.np.dtype(object): return series.fillna('.') else: return series </code></pre> you can add other elif statements if you want to fill a column of a different dtype in some other way. Now apply this function over all columns of the dataframe <pre class="prettyprint"><code>df = df.apply(myfillna) </code></pre> this is the same as 'inplace'

Fillna in multiple columns in place in Python Pandas

Tags:

python

pandas

dataframe

I have a pandas dataFrame of mixed types, some are strings and some are numbers. I would like to replace the NAN values in string columns by '.', and the NAN values in float columns by 0.

Consider this small fictitious example:

df = pd.DataFrame({'Name':['Jack','Sue',pd.np.nan,'Bob','Alice','John'],
    'A': [1, 2.1, pd.np.nan, 4.7, 5.6, 6.8],
    'B': [.25, pd.np.nan, pd.np.nan, 4, 12.2, 14.4],
    'City':['Seattle','SF','LA','OC',pd.np.nan,pd.np.nan]})

Now, I can do it in 3 lines:

df['Name'].fillna('.',inplace=True)
df['City'].fillna('.',inplace=True)
df.fillna(0,inplace=True)

Since this is a small dataframe, 3 lines is probably ok. In my real example (which I cannot share here due to data confidentiality reasons), I have many more string columns and numeric columns. SO I end up writing many lines just for fillna. Is there a concise way of doing this?

727

asked Jan 21 '16 01:01

ozzy

4 Answers

Came across this page while looking for an answer to this problem, but didn't like the existing answers. I ended up finding something better in the DataFrame.fillna documentation, and figured I'd contribute for anyone else that happens upon this.

If you have multiple columns, but only want to replace the NaN in a subset of them, you can use:

df.fillna({'Name':'.', 'City':'.'}, inplace=True)

This also allows you to specify different replacements for each column. And if you want to go ahead and fill all remaining NaN values, you can just throw another fillna on the end:

df.fillna({'Name':'.', 'City':'.'}, inplace=True).fillna(0, inplace=True)

Edit (22 Apr 2021)

Functionality (presumably / apparently) changed since original post, and you can no longer chain 2 inplace fillna() operations. You can still chain, but now must assign that chain to the df instead of modifying in place, e.g. like so:

df = df.fillna({'Name':'.', 'City':'.'}).fillna(0)

answered Oct 17 '22 12:10

Rob Bulmahn

You could use apply for your columns with checking dtype whether it's numeric or not by checking dtype.kind:

res = df.apply(lambda x: x.fillna(0) if x.dtype.kind in 'biufc' else x.fillna('.'))

print(res)
     A      B     City   Name
0  1.0   0.25  Seattle   Jack
1  2.1   0.00       SF    Sue
2  0.0   0.00       LA      .
3  4.7   4.00       OC    Bob
4  5.6  12.20        .  Alice
5  6.8  14.40        .   John

answered Oct 17 '22 12:10

Anton Protopopov

You can either list the string columns by hand or glean them from df.dtypes. Once you have the list of string/object columns, you can call fillna on all those columns at once.

# str_cols = ['Name','City']
str_cols = df.columns[df.dtypes==object]
df[str_cols] = df[str_cols].fillna('.')
df = df.fillna(0)

answered Oct 17 '22 13:10

Bob Baxley

define a function:

def myfillna(series):
    if series.dtype is pd.np.dtype(float):
        return series.fillna(0)
    elif series.dtype is pd.np.dtype(object):
        return series.fillna('.')
    else:
        return series

you can add other elif statements if you want to fill a column of a different dtype in some other way. Now apply this function over all columns of the dataframe

df = df.apply(myfillna)

this is the same as 'inplace'

answered Oct 17 '22 13:10

latorrefabian

Related questions
                            
                                What is the best way to create a string array in python?
                            
                                python csv2libsvm.py: AttributeError: '_csv.reader' object has no attribute 'next'
                            
                                How to swap two DataFrame columns?
                            
                                How can I profile a SQLAlchemy powered application?
                            
                                How to check python anaconda version installed on Windows 10 PC?
                            
                                Parameter substitution for a SQLite "IN" clause
                            
                                Python remove stop words from pandas dataframe
                            
                                How to upload new versions of project to PyPI with twine?
                            
                                Using File Extension Wildcards in os.listdir(path)
                            
                                Jupyter Notebook 500 : Internal Server Error
                            
                                Python, how to read bytes from file and save it? [closed]
                            
                                What is the performance impact of non-unique indexes in pandas?
                            
                                python pandas replacing strings in dataframe with numbers
                            
                                Fine control over the font size in Seaborn plots for academic papers
                            
                                Python Pandas Group by date using datetime data
                            
                                Run multiple python scripts concurrently
                            
                                Determine whether a key is present in a dictionary [duplicate]
                            
                                Time difference in seconds from numpy.timedelta64
                            
                                Expanding English language contractions in Python
                            
                                Matplotlib Plot Lines with Colors Through Colormap

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With