My code below takes in CSV data and uses pandas to_dict()
function as one step in converting the data to JSON. The problem is it is modifying the float numbers (e.g. 1.6 becomes 1.6000000000000001). I am not concerned about the loss of accuracy, but because users will see the change in the numbers, it looks amateurish.
I am aware:
As such the issue with the previous solutions are:
Is there a better way to ensure the numbers are not being modified, but are kept in a numeric datatype? Is it a question of changing how I import the CSV data in the first place? Surely there is a simple solution I am overlooking?
Here is a simple script that will reproduce this bug:
import pandas as pd
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
CSV_Data = "Index,Column_1,Column_2,Column_3,Column_4,Column_5,Column_6,Column_7,Column_8\nindex_1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8\nindex_2,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8\nindex_3,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8\nindex_4,4.1,4.2,4.3,4.4,4.5,4.6,4.7,4.8"
input_data = StringIO(CSV_Data)
df = pd.DataFrame.from_csv(path = input_data, header = 0, sep=',', index_col=0, encoding='utf-8')
print(df.to_dict(orient = 'records'))
Use pandas DataFrame. astype() function to convert column from string/int to float, you can apply this on a specific column or on an entire DataFrame. To cast the data type to 54-bit signed float, you can use numpy. float64 , numpy.
round() function is used to round a DataFrame to a variable number of decimal places. This function provides the flexibility to round different columns by different places.
To convert a column that includes a mixture of float and NaN values to int, first replace NaN values with zero on pandas DataFrame and then use astype() to convert. Use DataFrame. fillna() to replace the NaN values with integer value zero.
I need to make df.to_dict('list')
with right float numbers. But df.to_json()
doesn't support orient='list'
yet. So I do following:
list_oriented_dict = {
column: list(data.values())
for column, data in json.loads(df.to_json()).items()
}
Not the best way, but it works for me. Maybe some one has a more elegant solution?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With