Why does Pandas coerce my numpy float32 to float64?

Tags:

Why does Pandas coerce my numpy float32 to float64 in this piece of code:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([[1, 2, 'a'], [3, 4, 'b']], dtype=np.float32)
>>> A = df.ix[:, 0:1].values
>>> df.ix[:, 0:1] = A
>>> df[0].dtype
dtype('float64')

The behavior seems so odd to me that wonder if it is a bug. I am on Pandas version 0.17.1 (updated PyPI version) and I note there has been coercing bugs recently addressed, see https://github.com/pydata/pandas/issues/11847 . I haven't tried the piece of code with an updated GitHub master.

Is it a bug or do I misunderstand some "feature" in Pandas? If it is a feature, then how do I get around it?

(The coercing problem relates to a question I recently asked about the performance of Pandas assignments: Assignment of Pandas DataFrame with float32 and float64 slow)

265

asked Feb 05 '16 17:02

Finn Årup Nielsen

2 Answers

I think it is worth posting this as a GitHub issue. The behavior is certainly inconsistent.

The code takes a different branch based on whether the DataFrame is mixed-type or not (source).

In the mixed-type case the ndarray is converted to a Python list of float64 numbers and then converted back into float64 ndarray disregarding the DataFrame's dtypes information (function maybe_convert_objects()).
In the non-mixed-type case the DataFrame content is updated pretty much directly (source) and the DataFrame keeps its float32 dtypes.

answered Oct 29 '22 13:10

Martin Valgur

Not an answer, but my recreation of the problem:

In [2]: df = pd.DataFrame([[1, 2, 'a'], [3, 4, 'b']], dtype=np.float32)
In [3]: df.dtypes
Out[3]: 
0    float32
1    float32
2     object
dtype: object
In [4]: A=df.ix[:,:1].values
In [5]: A
Out[5]: 
array([[ 1.,  2.],
       [ 3.,  4.]], dtype=float32)
In [6]: df.ix[:,:1] = A
In [7]: df.dtypes
Out[7]: 
0    float64
1    float64
2     object
dtype: object
In [8]: pd.__version__
Out[8]: '0.15.0'

I'm not as familiar with pandas as numpy, but I'm puzzled as to why ix[:,:1] gives me a 2 column result. In numpy that sort of indexing gives just 1 column.

If I assign a single column dtype does not change

In [47]: df.ix[:,[0]]=A[:,0]
In [48]: df.dtypes
Out[48]: 
0    float32
1    float32
2     object

The same actions without mixed datatypes does not change dtypes

In [100]: df1 = pd.DataFrame([[1, 2, 1.23], [3, 4, 3.32]], dtype=np.float32)
In [101]: A1=df1.ix[:,:1].values
In [102]: df1.ix[:,:1]=A1
In [103]: df1.dtypes
Out[103]: 
0    float32
1    float32
2    float32
dtype: object

The key must be that with mixed values, the dataframe is, in one sense or other, a dtype=object array, whether that's true of its internal data storage, or just its numpy interface.

In [104]: df1.as_matrix()
Out[104]: 
array([[ 1.        ,  2.        ,  1.23000002],
       [ 3.        ,  4.        ,  3.31999993]], dtype=float32)
In [105]: df.as_matrix()
Out[105]: 
array([[1.0, 2.0, 'a'],
       [3.0, 4.0, 'b']], dtype=object)

answered Oct 29 '22 14:10

hpaulj

Related questions
                            
                                Applying string functions to elements that can be NaN
                            
                                Pandas DataFrame contains NaNs after write operation
                            
                                Django application having in memory a big Panda object shared across all requests?
                            
                                Why do new objects in multiprocessing have the same id?
                            
                                Odoo. Dropdown with limits
                            
                                Django Rest API POST issues
                            
                                Get value of last expression in `exec` call
                            
                                In Django filter statement what's the difference between __in and equal sign (=)?
                            
                                csv-table formatting in Python docstrings (Sphinx) - multiple lines in one cell
                            
                                Reading from a file and writing to StringIO - Python
                            
                                Capture debug output from Python smtplib
                            
                                Python GEOS ImportError
                            
                                django, property update a model instance
                            
                                CPython - Read Python Dictionary (keys/values) inside a C Function Passed as argument
                            
                                How can i detect one word with speech recognition in Python
                            
                                How to optimize image size using wand in python
                            
                                Django/sqlite3 "OperationalError: no such table" on threaded operation
                            
                                Python Scapy / operator, | pipe in types
                            
                                Python package installation: pip vs yum, or both together?
                            
                                Wiener Filter for image deblur

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does Pandas coerce my numpy float32 to float64?

Tags:

python

pandas

numpy

coercion

Finn Årup Nielsen

People also ask

2 Answers

Martin Valgur

hpaulj

Recent Activity

Donate For Us