Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas to_dict unwantedly modifying float numbers

My code below takes in CSV data and uses pandas to_dict() function as one step in converting the data to JSON. The problem is it is modifying the float numbers (e.g. 1.6 becomes 1.6000000000000001). I am not concerned about the loss of accuracy, but because users will see the change in the numbers, it looks amateurish.

I am aware:

  • this is something that has come up before here, but it was two years ago, was not really answered in a great way,
  • also I have an additional complication: the data frames I am looking to convert to dictionaries could be any combination of datatypes

As such the issue with the previous solutions are:

  1. Converting all the numbers to objects only works if you don't need to (numerically) use the numbers. I want the option to calculate sums and averages which reintroduces the addition decimal issue.
  2. Force rounding of numbers to x decimals will either reduce accuracy or add additional unnecessary 0s depending on the data the user provides

My question:

Is there a better way to ensure the numbers are not being modified, but are kept in a numeric datatype? Is it a question of changing how I import the CSV data in the first place? Surely there is a simple solution I am overlooking?

Here is a simple script that will reproduce this bug:

import pandas as pd

import sys
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO

CSV_Data = "Index,Column_1,Column_2,Column_3,Column_4,Column_5,Column_6,Column_7,Column_8\nindex_1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8\nindex_2,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8\nindex_3,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8\nindex_4,4.1,4.2,4.3,4.4,4.5,4.6,4.7,4.8"

input_data = StringIO(CSV_Data)
df = pd.DataFrame.from_csv(path = input_data, header = 0, sep=',', index_col=0, encoding='utf-8')
print(df.to_dict(orient = 'records'))
like image 211
Brett Romero Avatar asked Apr 18 '16 13:04

Brett Romero


People also ask

How do you change to float in pandas?

Use pandas DataFrame. astype() function to convert column from string/int to float, you can apply this on a specific column or on an entire DataFrame. To cast the data type to 54-bit signed float, you can use numpy. float64 , numpy.

How do you round off decimal numbers in pandas DataFrame?

round() function is used to round a DataFrame to a variable number of decimal places. This function provides the flexibility to round different columns by different places.

How do you convert all float columns to INT in pandas?

To convert a column that includes a mixture of float and NaN values to int, first replace NaN values with zero on pandas DataFrame and then use astype() to convert. Use DataFrame. fillna() to replace the NaN values with integer value zero.


1 Answers

I need to make df.to_dict('list') with right float numbers. But df.to_json() doesn't support orient='list' yet. So I do following:

 list_oriented_dict = {
    column: list(data.values())
    for column, data in json.loads(df.to_json()).items()
}

Not the best way, but it works for me. Maybe some one has a more elegant solution?

like image 176
Oleksandr Zaitsev Avatar answered Sep 28 '22 06:09

Oleksandr Zaitsev