Why does pandas.DataFrame.update change the dtypes of the updated dataframe?

Tags:

python

pandas

I would like to keep the dtypes of the columns as int after the update for obvious reasons. Any ideas why this doesn't work as expected?

import pandas as pd

df1 = pd.DataFrame([
    {'a': 1, 'b': 2, 'c': 'foo'},
    {'a': 3, 'b': 4, 'c': 'baz'},
])

df2 = pd.DataFrame([
    {'a': 1, 'b': 8, 'c': 'bar'},
])

print 'dtypes before update:\n%s\n%s' % (df1.dtypes, df2.dtypes)

df1.update(df2)

print '\ndtypes after update:\n%s\n%s' % (df1.dtypes, df2.dtypes)

The output looks like this:

dtypes before update:
a     int64
b     int64
c    object
dtype: object
a     int64
b     int64
c    object
dtype: object

dtypes after update:
a    float64
b    float64
c     object
dtype: object
a     int64
b     int64
c    object
dtype: object

Thanks to anyone that has some advise

379

asked Jan 29 '15 14:01

Brendan Maguire

1 Answers

This is a known issue. https://github.com/pydata/pandas/issues/4094 I think your only option currently is calling astype(int) after the update.

answered Nov 04 '22 00:11

JAB

Related questions
                            
                                How to install/compile pip requirements in parallel (make -j equivalent)
                            
                                History across ipdb sessions
                            
                                Multiprocess sqlite INSERT: "database is locked"
                            
                                Deploying flask site/application on pythonanywhere.com
                            
                                Error installing package with pip
                            
                                Django MPTT efficiently serializing relational data with DRF
                            
                                Celery execute task with a batch of messages
                            
                                SQLAlchemy joins with composite foreign keys (with flask-sqlalchemy)
                            
                                Django error reporting emails: env vars leak info
                            
                                How multiarray.correlate2(a, v, mode) is actually implemented?
                            
                                IPython notebook interactive function: how to set the slider range
                            
                                Numpy repeat array along new axis
                            
                                Replacing named capturing groups with re.sub
                            
                                Haar Training: error (-215)_img.row * _img.cols == vecSize in function
                            
                                Add module inside cuckoo sandbox
                            
                                Convert and pad a list to numpy array
                            
                                How to make Menu.add_command() work in tkinter on the Mac?
                            
                                Pandas backwards compatibility issue with pickle 0.14.1 and 0.15.2
                            
                                Having trouble implementing a readlink() function
                            
                                Are Mixin classes abstract base classes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With