I observed the following behavior with DataFrame.to_json
:
>>> df = pd.DataFrame([[eval(f'1.12345e-{i}') for i in range(8, 20)]])
>>> df
0 1 2 3 4 5 6 7 8 9 10 11
0 1.123450e-08 1.123450e-09 1.123450e-10 1.123450e-11 1.123450e-12 1.123450e-13 1.123450e-14 1.123450e-15 1.123450e-16 1.123450e-17 1.123450e-18 1.123450e-19
>>> print(df.to_json(indent=2, orient='index'))
{
"0":{
"0":0.0000000112,
"1":0.0000000011,
"2":0.0000000001,
"3":0.0,
"4":0.0,
"5":0.0,
"6":0.0,
"7":0.0,
"8":1.12345e-16,
"9":1.12345e-17,
"10":1.12345e-18,
"11":1.12345e-19
}
}
So all numbers down to 1e-16
seem to be rounded to 10 decimal places (in agreement with the default value for double_precision
) but all smaller values are represented exactly. Why is this the case and how can I turn off decimal rounding for the larger values too (i.e. using scientific notation instead)?
>>> pd.__version__
'1.3.1'
For reference, the standard library's json
module doesn't do this:
>>> import json
>>> print(json.dumps([eval(f'1.12345e-{i}') for i in range(8, 20)], indent=2))
[
1.12345e-08,
1.12345e-09,
1.12345e-10,
1.12345e-11,
1.12345e-12,
1.12345e-13,
1.12345e-14,
1.12345e-15,
1.12345e-16,
1.12345e-17,
1.12345e-18,
1.12345e-19
]
Format float value using the round() Method in Python The round() is a built-in Python method that returns the floating-point number rounded off to the given digits after the decimal point. You can use the round() method to format the float value.
Use pandas DataFrame. astype() function to convert column from string/int to float, you can apply this on a specific column or on an entire DataFrame. To cast the data type to 54-bit signed float, you can use numpy. float64 , numpy.
Round() Round() is a built-in function available with python. It will return you a float number that will be rounded to the decimal places which are given as input. If the decimal places to be rounded are not specified, it is considered as 0, and it will round to the nearest integer.
I'm not sure on achieving this with pd.DataFrame.to_json, but we can use pd.DataFrame.to_dict, json, and pd.read_json to achieve a full precision json representation from a pandas dataframe.
json_df = json.dumps(df.to_dict('index'), indent=2)
>>> print(json_df)
{
"0": {
"0": 1.12345e-08,
"1": 1.12345e-09,
"2": 1.12345e-10,
"3": 1.12345e-11,
"4": 1.12345e-12,
"5": 1.12345e-13,
"6": 1.12345e-14,
"7": 1.12345e-15,
"8": 1.12345e-16,
"9": 1.12345e-17,
"10": 1.12345e-18,
"11": 1.12345e-19
}
}
To read it back in, we can then do:
>>> pd.read_json(json_df, orient='index')
0 1 2 ... 9 10 11
0 1.123450e-08 1.123450e-09 1.123450e-10 ... 1.123450e-17 1.123450e-18 1.123450e-19
[1 rows x 12 columns]
It refers to /pandas/io/json/_json.py
codebase, by default the precision
integer is upto 10
, please see the below from the codebase..
def to_json(
path_or_buf,
obj,
orient: Optional[str] = None,
date_format: str = "epoch",
double_precision: int = 10,
force_ascii: bool = True,
date_unit: str = "ms",
default_handler: Optional[Callable[[Any], JSONSerializable]] = None,
lines: bool = False,
compression: Optional[str] = "infer",
index: bool = True,
indent: int = 0,
If you apply max precision then, you will be getting below..
>>> print(df.to_json(indent=2, orient='records', double_precision=15))
[
{
"0":0.0000000112345,
"1":0.00000000112345,
"2":0.000000000112345,
"3":0.000000000011234,
"4":0.000000000001123,
"5":0.000000000000112,
"6":0.000000000000011,
"7":0.000000000000001,
"8":1.12345e-16,
"9":1.12345e-17,
"10":1.12345e-18,
"11":1.12345e-19,
"12":1.12345e-20,
"13":1.12345e-21,
"14":1.12345e-22,
"15":1.12345e-23,
"16":1.12345e-24,
"17":1.12345e-25,
"18":1.12345e-26,
"19":1.12345e-27,
"20":1.12345e-28,
"21":1.12345e-29,
"22":1.12345e-30,
"23":1.12345e-31,
"24":1.12345e-32,
"25":1.12345e-33,
"26":1.12345e-34,
"27":1.12345e-35,
"28":1.12345e-36,
"29":1.12345e-37,
"30":1.12345e-38,
"31":1.12345e-39
}
]
Note: you will get value error if you use precision
more than 15.
ValueError: Invalid value '20' for option 'double_precision', max is '15'
So, in a sense this is not same as json.dumps
.
pd.DataFrame.to_json
uses internal library pandas._libs.json
and not a standard json
module. This explains the difference in behavior. The former "normalizes" the numbers internally and does not expose API to control it. As a result, you have following options:
Either convert to dictionary and dump using standard json
library (as mentioned earlier):
>>> print(json.dumps(df.to_dict(orient='records'), indent=2))
[
{
"0": 1.12345e-08,
"1": 1.12345e-09,
"2": 1.12345e-10,
"3": 1.12345e-11,
"4": 1.12345e-12,
"5": 1.12345e-13,
"6": 1.12345e-14,
"7": 1.12345e-15,
"8": 1.12345e-16,
"9": 1.12345e-17,
"10": 1.12345e-18,
"11": 1.12345e-19
}
]
This is perfectly legitimate solution.
You can use CSV format instead of JSON and specify desired floating point format:
>>> print(df.to_csv(float_format='%.10e', index=False))
0,1,2,3,4,5,6,7,8,9,10,11
1.1234500000e-08,1.1234500000e-09,1.1234500000e-10,1.1234500000e-11,1.1234500000e-12,1.1234500000e-13,1.1234500000e-14,1.1234500000e-15,1.1234500000e-16,1.1234500000e-17,1.1234500000e-18,1.1234500000e-19
Another option is to convert values to string before the "normalization" kicks in:
>>> print(df.astype(str).to_json(indent=2, orient='index'))
{
"0":{
"0":"1.12345e-08",
"1":"1.12345e-09",
"2":"1.12345e-10",
"3":"1.12345e-11",
"4":"1.12345e-12",
"5":"1.12345e-13",
"6":"1.12345e-14",
"7":"1.12345e-15",
"8":"1.12345e-16",
"9":"1.12345e-17",
"10":"1.12345e-18",
"11":"1.12345e-19"
}
}
The conversion to string will require special attention when reading back the JSON.
And finally, if you need the exact values, just use binary format such as parquet
or pickle
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With