Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas to_dict changes index type with outtype='records'

I'm am trying to call the to_dict function on the following DataFrame:

import pandas as pd

data = {"a": [1,2,3,4,5], "b": [90,80,40,60,30]}

df = pd.DataFrame(data)

   a   b
0  1  90
1  2  80
2  3  40
3  4  60
4  5  30

df.reset_index().to_dict("r")

[{'a': 1, 'b': 90, 'index': 0},
 {'a': 2, 'b': 80, 'index': 1},
 {'a': 3, 'b': 40, 'index': 2},
 {'a': 4, 'b': 60, 'index': 3},
 {'a': 5, 'b': 30, 'index': 4}]

However my problem occurs if I perform a float operation on the dataframe, which mutates the index into a float:

(df*1.0).reset_index().to_dict("r")

[{'a': 1.0, 'b': 90.0, 'index': 0.0},  
{'a': 2.0, 'b': 80.0, 'index': 1.0},  
{'a': 3.0, 'b': 40.0, 'index': 2.0},  
{'a': 4.0, 'b': 60.0, 'index': 3.0},  
{'a': 5.0, 'b': 30.0, 'index': 4.0}]

Can anyone explain the above behaviour or recommend a workaround, or verify whether or not this could be a pandas bug? None of the other outtypes in the to_dict method mutates the index as shown above.

I've replicated this on both pandas 0.14 and 0.18 (latest)

Many thanks!

like image 386
Tsu-Shiuan Lin Avatar asked Apr 11 '16 12:04

Tsu-Shiuan Lin


People also ask

How do I change the index type in pandas DataFrame?

To change the type of a DataFrame's index in Pandas, use the DataFrame. index. astype(~) method.

What does to_dict do in Python?

to_dict() method is used to convert a dataframe into a dictionary of series or list like data type depending on orient parameter. Parameters: orient: String value, ('dict', 'list', 'series', 'split', 'records', 'index') Defines which dtype to convert Columns(series into).

How do you change a DataFrame index?

To reset the index in pandas, you simply need to chain the function . reset_index() with the dataframe object. On applying the . reset_index() function, the index gets shifted to the dataframe as a separate column.

What is the data type of index in pandas?

Pandas Index is an immutable ndarray implementing an ordered, sliceable set. It is the basic object which stores the axis labels for all pandas objects. Pandas Index. dtype attribute return the data type (dtype) of the underlying data of the given Index object.


1 Answers

This question has been answered on github here

I will convey the answer here so the question may be marked as solved and moved off the top-list of unanswered pandas questions.

From Github:

Nothing to do with the index, just the fact that you have any float dtypes in the data

If you look at the code, we use DataFrame.values, which returns a NumPy array, which must have a single dtype (float64 in this case).

--TomAugspurger

A workaround for the problem would be:

[x._asdict() for x in df.itertuples()]

Which generates a list of OrderedDict objects

[OrderedDict([('Index', 0), ('a', 1.0), ('b', 90)]),
 OrderedDict([('Index', 1), ('a', 2.0), ('b', 80)]),
 OrderedDict([('Index', 2), ('a', 3.0), ('b', 40)]),
 OrderedDict([('Index', 3), ('a', 4.0), ('b', 60)]),
 OrderedDict([('Index', 4), ('a', 5.0), ('b', 30)])]
like image 62
firelynx Avatar answered Oct 06 '22 00:10

firelynx