I'm am trying to call the to_dict function on the following DataFrame:
import pandas as pd
data = {"a": [1,2,3,4,5], "b": [90,80,40,60,30]}
df = pd.DataFrame(data)
a b
0 1 90
1 2 80
2 3 40
3 4 60
4 5 30
df.reset_index().to_dict("r")
[{'a': 1, 'b': 90, 'index': 0},
{'a': 2, 'b': 80, 'index': 1},
{'a': 3, 'b': 40, 'index': 2},
{'a': 4, 'b': 60, 'index': 3},
{'a': 5, 'b': 30, 'index': 4}]
However my problem occurs if I perform a float operation on the dataframe, which mutates the index into a float:
(df*1.0).reset_index().to_dict("r")
[{'a': 1.0, 'b': 90.0, 'index': 0.0},
{'a': 2.0, 'b': 80.0, 'index': 1.0},
{'a': 3.0, 'b': 40.0, 'index': 2.0},
{'a': 4.0, 'b': 60.0, 'index': 3.0},
{'a': 5.0, 'b': 30.0, 'index': 4.0}]
Can anyone explain the above behaviour or recommend a workaround, or verify whether or not this could be a pandas bug? None of the other outtypes in the to_dict method mutates the index as shown above.
I've replicated this on both pandas 0.14 and 0.18 (latest)
Many thanks!
To change the type of a DataFrame's index in Pandas, use the DataFrame. index. astype(~) method.
to_dict() method is used to convert a dataframe into a dictionary of series or list like data type depending on orient parameter. Parameters: orient: String value, ('dict', 'list', 'series', 'split', 'records', 'index') Defines which dtype to convert Columns(series into).
To reset the index in pandas, you simply need to chain the function . reset_index() with the dataframe object. On applying the . reset_index() function, the index gets shifted to the dataframe as a separate column.
Pandas Index is an immutable ndarray implementing an ordered, sliceable set. It is the basic object which stores the axis labels for all pandas objects. Pandas Index. dtype attribute return the data type (dtype) of the underlying data of the given Index object.
This question has been answered on github here
I will convey the answer here so the question may be marked as solved and moved off the top-list of unanswered pandas questions.
From Github:
Nothing to do with the index, just the fact that you have any float dtypes in the data
If you look at the code, we use DataFrame.values, which returns a NumPy array, which must have a single dtype (float64 in this case).
--TomAugspurger
A workaround for the problem would be:
[x._asdict() for x in df.itertuples()]
Which generates a list of OrderedDict objects
[OrderedDict([('Index', 0), ('a', 1.0), ('b', 90)]),
OrderedDict([('Index', 1), ('a', 2.0), ('b', 80)]),
OrderedDict([('Index', 2), ('a', 3.0), ('b', 40)]),
OrderedDict([('Index', 3), ('a', 4.0), ('b', 60)]),
OrderedDict([('Index', 4), ('a', 5.0), ('b', 30)])]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With