Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get python pandas to_dict with orient='records' but without float cast

Tags:

python

pandas

I have a dataframe with one col int one col floats:

df
#    a      b
# 0  3  42.00
# 1  2   3.14

df.dtypes
# a      int64
# b    float64
# dtype: object

I want a list of dicts like the one provide by df.to_dict(orient='records')

df.to_dict(orient='records')
[{'a': 3.0, 'b': 42.0}, {'a': 2.0, 'b': 3.1400000000000001}]

But with a as int, not casted as float

like image 285
user3313834 Avatar asked Jun 18 '16 13:06

user3313834


People also ask

What does to_dict do in Python?

to_dict() method is used to convert a dataframe into a dictionary of series or list like data type depending on orient parameter. Parameters: orient: String value, ('dict', 'list', 'series', 'split', 'records', 'index') Defines which dtype to convert Columns(series into).

What is to_ dict records?

The to_dict() function is used to convert the DataFrame to a dictionary. Syntax: DataFrame.to_dict(self, orient='dict', into=<class 'dict'>) Parameters: Name.

How do I convert a Pandas DataFrame to a dictionary?

To convert pandas DataFrame to Dictionary object, use to_dict() method, this takes orient as dict by default which returns the DataFrame in format {column -> {index -> value}} . When no orient is specified, to_dict() returns in this format.

Is pandas DataFrame a dictionary?

Pandas can create dataframes from many kinds of data structures—without you having to write lots of lengthy code. One of those data structures is a dictionary.


2 Answers

Currently (as of Pandas version 0.18), df.to_dict('records') accesses the NumPy array df.values. This property upcasts the dtype of the int column to float so that the array can have a single common dtype. After this point there is no hope of returning the desired result -- all the ints have been converted to floats.

So instead, building on ayhan's and Tom Augspurger's suggestion you could use a list and dict comprehension:

import pandas as pd

df = pd.DataFrame({'a':[3,2], 'b':[42.0,3.14]})
result = [{col:getattr(row, col) for col in df} for row in df.itertuples()]
print(result)
# [{'a': 3, 'b': 42.0}, {'a': 2, 'b': 3.1400000000000001}]
like image 142
unutbu Avatar answered Oct 18 '22 22:10

unutbu


Another horrible workaround is to (temporarily) add a non-numeric column, e.g. starting with:

df = pd.DataFrame([[1, 2.4], [3, 4.0]], columns='a b'.split())

then df.to_dict(orient='record') promotes to floats, but if you do:

df['foo'] = 'bar'
[{k: v for (k, v) in row.items() if k != 'foo'} for row in df.to_dict(orient='record')]

you preserve the original types. I notice that df.reindex() behaves similarly, as explained in the Pandas gotchas but you can't workaround unless you fill with non-nil values, e.g. fill_value=0

like image 44
patricksurry Avatar answered Oct 18 '22 22:10

patricksurry