Prevent coercion of pandas data frames while indexing and inserting rows

Tags:

I'm working with individual rows of pandas data frames, but I'm stumbling over coercion issues while indexing and inserting rows. Pandas seems to always want to coerce from a mixed int/float to all-float types, and I can't see any obvious controls on this behaviour.

For example, here is a simple data frame with a as int and b as float:

import pandas as pd
pd.__version__  # '0.25.2'

df = pd.DataFrame({'a': [1], 'b': [2.2]})
print(df)
#    a    b
# 0  1  2.2
print(df.dtypes)
# a      int64
# b    float64
# dtype: object

Here is a coercion issue while indexing one row:

print(df.loc[0])
# a    1.0
# b    2.2
# Name: 0, dtype: float64
print(dict(df.loc[0]))
# {'a': 1.0, 'b': 2.2}

And here is a coercion issue while inserting one row:

df.loc[1] = {'a': 5, 'b': 4.4}
print(df)
#      a    b
# 0  1.0  2.2
# 1  5.0  4.4
print(df.dtypes)
# a    float64
# b    float64
# dtype: object

In both instances, I want the a column to remain as an integer type, rather than being coerced to a float type.

307

asked Oct 23 '19 23:10

Mike T

3 Answers

After some digging, here are some terribly ugly workarounds. (A better answer will be accepted.)

A quirk found here is that non-numeric columns stops coercion, so here is how to index one row to a dict:

dict(df.assign(_='').loc[0].drop('_', axis=0))
# {'a': 1, 'b': 2.2}

And inserting a row can be done by creating a new data frame with one row:

df = df.append(pd.DataFrame({'a': 5, 'b': 4.4}, index=[1]))
print(df)
#    a    b
# 0  1  2.2
# 1  5  4.4

Both of these tricks are not optimised for large data frames, so I would greatly appreciate a better answer!

102

answered Oct 19 '22 17:10

Mike T

Whenever you are getting data from dataframe or appending data to a dataframe and need to keep the data type same, avoid conversion to other internal structures which are not aware of the data types needed.

When you do df.loc[0] it converts to pd.Series,

>>> type(df.loc[0])
<class 'pandas.core.series.Series'>

And now, Series will only have a single dtype. Thus coercing int to float.

Instead keep structure as pd.DataFrame,

>>> type(df.loc[[0]])
<class 'pandas.core.frame.DataFrame'>

Select row needed as a frame and then convert to dict

>>> df.loc[[0]].to_dict(orient='records')
[{'a': 1, 'b': 2.2}]

Similarly, to add a new row, Use pandas pd.DataFrame.append function,

>>> df = df.append([{'a': 5, 'b': 4.4}]) # NOTE: To append as a row, use []
   a    b
0  1  2.2
0  5  4.4

The above will not cause type conversion,

>>> df.dtypes
a      int64
b    float64
dtype: object

answered Oct 19 '22 17:10

Vishnudev

The root of the problem is that

The indexing of pandas dataframe returns a pandas series

We can see that:

type(df.loc[0])
# pandas.core.series.Series

And a series can only have one dtype, in your case either int64 or float64.

There are two workarounds come to my head:

print(df.loc[[0]])
# this will return a dataframe instead of series
# so the result will be
#    a    b
# 0  1  2.2

# but the dictionary is hard to read
print(dict(df.loc[[0]]))
# {'a': 0    1
# Name: a, dtype: int64, 'b': 0    2.2
# Name: b, dtype: float64}

print(df.astype(object).loc[0])
# this will change the type of value to object first and then print
# so the result will be
# a      1
# b    2.2
# Name: 0, dtype: object

print(dict(df.astype(object).loc[0]))
# in this way the dictionary is as expected
# {'a': 1, 'b': 2.2}

When you append a dictionary to a dataframe, it will convert the dictionary to a Series first and then append. (So the same problem happens again)

https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L6973

if isinstance(other, dict):
    other = Series(other)

So your walkaround is actually a solid one, or else we could:

df.append(pd.Series({'a': 5, 'b': 4.4}, dtype=object, name=1))
#    a    b
# 0  1  2.2
# 1  5  4.4

answered Oct 19 '22 19:10

Hongpei

Related questions
                            
                                Autoload in Python
                            
                                How to schedule hundreds of thousands of tasks?
                            
                                Deploying Django (fastcgi, apache mod_wsgi, uwsgi, gunicorn)
                            
                                When where and how can i change the __class__ attr of an object?
                            
                                How do I change a value while debugging python with pdb?
                            
                                How to integrate any Python lint with GitHub commit status API?
                            
                                Autogenerate documentation for Python project using setuptools
                            
                                Get Celery to Use Django Test DB
                            
                                How to mock Python static methods and class methods
                            
                                Sphinx :ivar tag goes looking for cross-references
                            
                                Why does Python not implement the elif statement on try statement?
                            
                                How to get access of individual trees of a xgboost model in python /R
                            
                                Sklearn Pipeline - How to inherit get_params in custom Transformer (not Estimator)
                            
                                Alexa request validation in python
                            
                                Exit pdb Interactive Mode from Jupyter Notebook
                            
                                Transfer learning with tf.estimator.Estimator framework
                            
                                Twine upload TypeError: expected string or bytes-like object
                            
                                How to check if a function was called in a unit test using pytest-mock?
                            
                                Can I find out the allocation request that caused my Python MemoryError?
                            
                                Accessing neighboring cells for numpy array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Prevent coercion of pandas data frames while indexing and inserting rows

Tags:

python

pandas

coercion

Mike T

People also ask

3 Answers

Mike T

Vishnudev

Hongpei

Recent Activity

Donate For Us