I have a uint64
column in my DataFrame, but when I convert that DataFrame to a list of python dict using DataFrame.to_dict('record')
, what's previously a uint64
gets magically converted to float:
In [24]: mid['bd_id'].head()
Out[24]:
0 0
1 6957860914294
2 7219009614965
3 7602051814214
4 7916807114255
Name: bd_id, dtype: uint64
In [25]: mid.to_dict('record')[2]['bd_id']
Out[25]: 7219009614965.0
In [26]: bd = mid['bd_id']
In [27]: bd.head().to_dict()
Out[27]: {0: 0, 1: 6957860914294, 2: 7219009614965, 3: 7602051814214, 4: 7916807114255}
How can I avoid this strange behavior?
strangely enough, if I use to_dict()
instead of to_dict('records')
, the bd_id
column will be of type int:
In [43]: mid.to_dict()['bd_id']
Out[43]:
{0: 0,
1: 6957860914294,
2: 7219009614965,
...
to_dict() method is used to convert a dataframe into a dictionary of series or list like data type depending on orient parameter. Parameters: orient: String value, ('dict', 'list', 'series', 'split', 'records', 'index') Defines which dtype to convert Columns(series into).
A column in a DataFrame can only have one data type. The data type in a DataFrame's single column can be checked using dtype .
The dtype specified can be a buil-in Python, numpy , or pandas dtype. Let's suppose we want to convert column A (which is currently a string of type object ) into a column holding integers. To do so, we simply need to call astype on the pandas DataFrame object and explicitly define the dtype we wish to cast the column.
The main types stored in pandas objects are float, int, bool, datetime64[ns], timedelta[ns], and object. In addition these dtypes have item sizes, e.g. int64 and int32. By default integer types are int64 and float types are float64, REGARDLESS of platform (32-bit or 64-bit).
It's because another column has a float in it. More specifically to_dict('records')
is implemented using the values
attribute of the data frame rather than the columns itself, and this implements "implicit upcasting", in your case converting uint64 to float.
If you want to get around this bug, you could explicitly cast your dataframe to the object
datatype:
df.astype(object).to_dict('record')[2]['bd_id']
Out[96]: 7602051814214
By the way, if you are using IPython and you want to see how a function is implemented in a library you can brink it up by putting ??
at the end of the method call. For pd.DataFrame.to_dict??
we see
...
elif orient.lower().startswith('r'):
return [dict((k, v) for k, v in zip(self.columns, row))
for row in self.values]
You can use this
from pandas.io.json import dumps
import json
output=json.loads(dumps(mid,double_precision=0))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With