My code below takes in CSV data and uses pandas <code>to_dict()</code> function as one step in converting the data to JSON. The problem is it is modifying the float numbers (e.g. 1.6 becomes 1.6000000000000001). I am not concerned about the loss of accuracy, but because users will see the change in the numbers, it looks amateurish. I am aware: <ul> <li>this is something that has come up before here, but it was two years ago, was not really answered in a great way,</li> <li>also I have an additional complication: the data frames I am looking to convert to dictionaries could be any combination of datatypes </li> </ul> As such the issue with the previous solutions are: <ol> <li> Converting all the numbers to objects only works if you don't need to (numerically) use the numbers. I want the option to calculate sums and averages which reintroduces the addition decimal issue.</li> <li> Force rounding of numbers to x decimals will either reduce accuracy or add additional unnecessary 0s depending on the data the user provides</li> </ol> <h3>My question:</h3> Is there a better way to ensure the numbers are not being modified, but are kept in a numeric datatype? Is it a question of changing how I import the CSV data in the first place? Surely there is a simple solution I am overlooking? Here is a simple script that will reproduce this bug: <pre class="prettyprint"><code>import pandas as pd import sys if sys.version_info[0] < 3: from StringIO import StringIO else: from io import StringIO CSV_Data = "Index,Column_1,Column_2,Column_3,Column_4,Column_5,Column_6,Column_7,Column_8\nindex_1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8\nindex_2,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8\nindex_3,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8\nindex_4,4.1,4.2,4.3,4.4,4.5,4.6,4.7,4.8" input_data = StringIO(CSV_Data) df = pd.DataFrame.from_csv(path = input_data, header = 0, sep=',', index_col=0, encoding='utf-8') print(df.to_dict(orient = 'records')) </code></pre>

I need to make <code>df.to_dict('list')</code> with right float numbers. But <code>df.to_json()</code> doesn't support <code>orient='list'</code> yet. So I do following: <pre class="prettyprint"><code> list_oriented_dict = { column: list(data.values()) for column, data in json.loads(df.to_json()).items() } </code></pre> Not the best way, but it works for me. Maybe some one has a more elegant solution?

Pandas to_dict unwantedly modifying float numbers

Tags:

python

json

floating-point

pandas

csv

My code below takes in CSV data and uses pandas to_dict() function as one step in converting the data to JSON. The problem is it is modifying the float numbers (e.g. 1.6 becomes 1.6000000000000001). I am not concerned about the loss of accuracy, but because users will see the change in the numbers, it looks amateurish.

I am aware:

this is something that has come up before here, but it was two years ago, was not really answered in a great way,
also I have an additional complication: the data frames I am looking to convert to dictionaries could be any combination of datatypes

As such the issue with the previous solutions are:

Converting all the numbers to objects only works if you don't need to (numerically) use the numbers. I want the option to calculate sums and averages which reintroduces the addition decimal issue.
Force rounding of numbers to x decimals will either reduce accuracy or add additional unnecessary 0s depending on the data the user provides

My question:

Is there a better way to ensure the numbers are not being modified, but are kept in a numeric datatype? Is it a question of changing how I import the CSV data in the first place? Surely there is a simple solution I am overlooking?

Here is a simple script that will reproduce this bug:

import pandas as pd

import sys
if sys.version_info[0] < 3:
    from StringIO import StringIO
else:
    from io import StringIO

CSV_Data = "Index,Column_1,Column_2,Column_3,Column_4,Column_5,Column_6,Column_7,Column_8\nindex_1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8\nindex_2,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8\nindex_3,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8\nindex_4,4.1,4.2,4.3,4.4,4.5,4.6,4.7,4.8"

input_data = StringIO(CSV_Data)
df = pd.DataFrame.from_csv(path = input_data, header = 0, sep=',', index_col=0, encoding='utf-8')
print(df.to_dict(orient = 'records'))

211

asked Apr 18 '16 13:04

Brett Romero

1 Answers

I need to make df.to_dict('list') with right float numbers. But df.to_json() doesn't support orient='list' yet. So I do following:

 list_oriented_dict = {
    column: list(data.values())
    for column, data in json.loads(df.to_json()).items()
}

Not the best way, but it works for me. Maybe some one has a more elegant solution?

176

answered Sep 28 '22 06:09

Oleksandr Zaitsev

Related questions
                            
                                Closing the window doesn't kill all processes
                            
                                Angular route not working when used with Google App Engine and Flask
                            
                                self referential many to many flask-sqlalchemy
                            
                                Python Multiprocessing RuntimeError on Windows
                            
                                Python Regex slower than expected
                            
                                Using freezegun, why do pytz.utc and utcnow() output different datetimes?
                            
                                Why won't Python Multiprocessing Workers die?
                            
                                Sierpinski's Triangle Pygame Recursive
                            
                                pandas plots on Seaborn FacetGrid
                            
                                Streaming data for pandas df
                            
                                Pass FILE * into function from Python / ctypes
                            
                                How to make python scripts pipe-able both in bash and within python
                            
                                How to Access/Download OneNote notebook with Python?
                            
                                Dask DataFrame Groupby Partitions
                            
                                Adding Colorbar to a Spectrogram
                            
                                pytest fixture of fixtures
                            
                                extracting phase information using numpy fft
                            
                                Plotly: How to add borders and sidelabels to subplots, and syncronize panning?
                            
                                Is it possible to let PyCharm auto break line when writing long docstrings and comments?
                            
                                How to use Pretty Table in Python to print out data from multiple lists?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With