Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dtype changes when using DataFrame.to_dict

Tags:

python

pandas

I have a uint64 column in my DataFrame, but when I convert that DataFrame to a list of python dict using DataFrame.to_dict('record'), what's previously a uint64 gets magically converted to float:

In [24]: mid['bd_id'].head()
Out[24]:
0                0
1    6957860914294
2    7219009614965
3    7602051814214
4    7916807114255
Name: bd_id, dtype: uint64

In [25]: mid.to_dict('record')[2]['bd_id']
Out[25]: 7219009614965.0

In [26]: bd = mid['bd_id']

In [27]: bd.head().to_dict()
Out[27]: {0: 0, 1: 6957860914294, 2: 7219009614965, 3: 7602051814214, 4: 7916807114255}

How can I avoid this strange behavior?

update

strangely enough, if I use to_dict() instead of to_dict('records'), the bd_id column will be of type int:

In [43]: mid.to_dict()['bd_id']
Out[43]:
{0: 0,
 1: 6957860914294,
 2: 7219009614965,
...
like image 252
timfeirg Avatar asked Jul 13 '15 03:07

timfeirg


People also ask

What does To_dict () do in Python?

to_dict() method is used to convert a dataframe into a dictionary of series or list like data type depending on orient parameter. Parameters: orient: String value, ('dict', 'list', 'series', 'split', 'records', 'index') Defines which dtype to convert Columns(series into).

Can DataFrame have different data types?

A column in a DataFrame can only have one data type. The data type in a DataFrame's single column can be checked using dtype .

How do I change Dtype in a data frame?

The dtype specified can be a buil-in Python, numpy , or pandas dtype. Let's suppose we want to convert column A (which is currently a string of type object ) into a column holding integers. To do so, we simply need to call astype on the pandas DataFrame object and explicitly define the dtype we wish to cast the column.

What datatype does pandas DataFrame support?

The main types stored in pandas objects are float, int, bool, datetime64[ns], timedelta[ns], and object. In addition these dtypes have item sizes, e.g. int64 and int32. By default integer types are int64 and float types are float64, REGARDLESS of platform (32-bit or 64-bit).


2 Answers

It's because another column has a float in it. More specifically to_dict('records') is implemented using the values attribute of the data frame rather than the columns itself, and this implements "implicit upcasting", in your case converting uint64 to float.

If you want to get around this bug, you could explicitly cast your dataframe to the object datatype:

df.astype(object).to_dict('record')[2]['bd_id']
Out[96]: 7602051814214

By the way, if you are using IPython and you want to see how a function is implemented in a library you can brink it up by putting ?? at the end of the method call. For pd.DataFrame.to_dict?? we see

    ...
    elif orient.lower().startswith('r'):
        return [dict((k, v) for k, v in zip(self.columns, row))
                for row in self.values]
like image 80
maxymoo Avatar answered Sep 19 '22 16:09

maxymoo


You can use this

from pandas.io.json import dumps
import json
output=json.loads(dumps(mid,double_precision=0))
like image 30
Saurabh Avatar answered Sep 19 '22 16:09

Saurabh