I observed the following behavior with <code>DataFrame.to_json</code>: <pre class="prettyprint lang-py prettyprint-override"><code>>>> df = pd.DataFrame([[eval(f'1.12345e-{i}') for i in range(8, 20)]]) >>> df 0 1 2 3 4 5 6 7 8 9 10 11 0 1.123450e-08 1.123450e-09 1.123450e-10 1.123450e-11 1.123450e-12 1.123450e-13 1.123450e-14 1.123450e-15 1.123450e-16 1.123450e-17 1.123450e-18 1.123450e-19 >>> print(df.to_json(indent=2, orient='index')) { "0":{ "0":0.0000000112, "1":0.0000000011, "2":0.0000000001, "3":0.0, "4":0.0, "5":0.0, "6":0.0, "7":0.0, "8":1.12345e-16, "9":1.12345e-17, "10":1.12345e-18, "11":1.12345e-19 } } </code></pre> So all numbers down to <code>1e-16</code> seem to be rounded to 10 decimal places (in agreement with the default value for <code>double_precision</code>) but all smaller values are represented exactly. Why is this the case and how can I turn off decimal rounding for the larger values too (i.e. using scientific notation instead)? <hr> <pre class="prettyprint lang-py prettyprint-override"><code>>>> pd.__version__ '1.3.1' </code></pre> <hr> For reference, the standard library's <code>json</code> module doesn't do this: <pre class="prettyprint lang-py prettyprint-override"><code>>>> import json >>> print(json.dumps([eval(f'1.12345e-{i}') for i in range(8, 20)], indent=2)) [ 1.12345e-08, 1.12345e-09, 1.12345e-10, 1.12345e-11, 1.12345e-12, 1.12345e-13, 1.12345e-14, 1.12345e-15, 1.12345e-16, 1.12345e-17, 1.12345e-18, 1.12345e-19 ] </code></pre>

<code>pd.DataFrame.to_json</code> uses internal library <code>pandas._libs.json</code> and not a standard <code>json</code> module. This explains the difference in behavior. The former "normalizes" the numbers internally and does not expose API to control it. As a result, you have following options: Either convert to dictionary and dump using standard <code>json</code> library (as mentioned earlier): <pre class="prettyprint"><code>>>> print(json.dumps(df.to_dict(orient='records'), indent=2)) [ { "0": 1.12345e-08, "1": 1.12345e-09, "2": 1.12345e-10, "3": 1.12345e-11, "4": 1.12345e-12, "5": 1.12345e-13, "6": 1.12345e-14, "7": 1.12345e-15, "8": 1.12345e-16, "9": 1.12345e-17, "10": 1.12345e-18, "11": 1.12345e-19 } ] </code></pre> This is perfectly legitimate solution. You can use CSV format instead of JSON and specify desired floating point format: <pre class="prettyprint"><code>>>> print(df.to_csv(float_format='%.10e', index=False)) 0,1,2,3,4,5,6,7,8,9,10,11 1.1234500000e-08,1.1234500000e-09,1.1234500000e-10,1.1234500000e-11,1.1234500000e-12,1.1234500000e-13,1.1234500000e-14,1.1234500000e-15,1.1234500000e-16,1.1234500000e-17,1.1234500000e-18,1.1234500000e-19 </code></pre> Another option is to convert values to string before the "normalization" kicks in: <pre class="prettyprint"><code>>>> print(df.astype(str).to_json(indent=2, orient='index')) { "0":{ "0":"1.12345e-08", "1":"1.12345e-09", "2":"1.12345e-10", "3":"1.12345e-11", "4":"1.12345e-12", "5":"1.12345e-13", "6":"1.12345e-14", "7":"1.12345e-15", "8":"1.12345e-16", "9":"1.12345e-17", "10":"1.12345e-18", "11":"1.12345e-19" } } </code></pre> The conversion to string will require special attention when reading back the JSON. And finally, if you need the exact values, just use binary format such as <code>parquet</code> or <code>pickle</code>.

How to get an exact representation of floats during `DataFrame.to_json`?

Tags:

python

json

floating-point

pandas

I observed the following behavior with DataFrame.to_json:

>>> df = pd.DataFrame([[eval(f'1.12345e-{i}') for i in range(8, 20)]])
>>> df
             0             1             2             3             4             5             6             7             8             9             10            11
0  1.123450e-08  1.123450e-09  1.123450e-10  1.123450e-11  1.123450e-12  1.123450e-13  1.123450e-14  1.123450e-15  1.123450e-16  1.123450e-17  1.123450e-18  1.123450e-19
>>> print(df.to_json(indent=2, orient='index'))
{
  "0":{
    "0":0.0000000112,
    "1":0.0000000011,
    "2":0.0000000001,
    "3":0.0,
    "4":0.0,
    "5":0.0,
    "6":0.0,
    "7":0.0,
    "8":1.12345e-16,
    "9":1.12345e-17,
    "10":1.12345e-18,
    "11":1.12345e-19
  }
}

So all numbers down to 1e-16 seem to be rounded to 10 decimal places (in agreement with the default value for double_precision) but all smaller values are represented exactly. Why is this the case and how can I turn off decimal rounding for the larger values too (i.e. using scientific notation instead)?

>>> pd.__version__
'1.3.1'

For reference, the standard library's json module doesn't do this:

>>> import json
>>> print(json.dumps([eval(f'1.12345e-{i}') for i in range(8, 20)], indent=2))
[
  1.12345e-08,
  1.12345e-09,
  1.12345e-10,
  1.12345e-11,
  1.12345e-12,
  1.12345e-13,
  1.12345e-14,
  1.12345e-15,
  1.12345e-16,
  1.12345e-17,
  1.12345e-18,
  1.12345e-19
]

253

asked Aug 24 '21 10:08

a_guest

Video Answer

3 Answers

I'm not sure on achieving this with pd.DataFrame.to_json, but we can use pd.DataFrame.to_dict, json, and pd.read_json to achieve a full precision json representation from a pandas dataframe.

json_df = json.dumps(df.to_dict('index'), indent=2)
>>> print(json_df)
{
  "0": {
    "0": 1.12345e-08,
    "1": 1.12345e-09,
    "2": 1.12345e-10,
    "3": 1.12345e-11,
    "4": 1.12345e-12,
    "5": 1.12345e-13,
    "6": 1.12345e-14,
    "7": 1.12345e-15,
    "8": 1.12345e-16,
    "9": 1.12345e-17,
    "10": 1.12345e-18,
    "11": 1.12345e-19
  }
}

To read it back in, we can then do:

>>> pd.read_json(json_df, orient='index')
             0             1             2   ...            9             10            11
0  1.123450e-08  1.123450e-09  1.123450e-10  ...  1.123450e-17  1.123450e-18  1.123450e-19

[1 rows x 12 columns]

100

answered Oct 21 '22 16:10

maneblusser

It refers to /pandas/io/json/_json.py codebase, by default the precision integer is upto 10, please see the below from the codebase..

def to_json(
    path_or_buf,
    obj,
    orient: Optional[str] = None,
    date_format: str = "epoch",
    double_precision: int = 10,
    force_ascii: bool = True,
    date_unit: str = "ms",
    default_handler: Optional[Callable[[Any], JSONSerializable]] = None,
    lines: bool = False,
    compression: Optional[str] = "infer",
    index: bool = True,
    indent: int = 0,

If you apply max precision then, you will be getting below..

>>> print(df.to_json(indent=2, orient='records', double_precision=15))
[
  {
    "0":0.0000000112345,
    "1":0.00000000112345,
    "2":0.000000000112345,
    "3":0.000000000011234,
    "4":0.000000000001123,
    "5":0.000000000000112,
    "6":0.000000000000011,
    "7":0.000000000000001,
    "8":1.12345e-16,
    "9":1.12345e-17,
    "10":1.12345e-18,
    "11":1.12345e-19,
    "12":1.12345e-20,
    "13":1.12345e-21,
    "14":1.12345e-22,
    "15":1.12345e-23,
    "16":1.12345e-24,
    "17":1.12345e-25,
    "18":1.12345e-26,
    "19":1.12345e-27,
    "20":1.12345e-28,
    "21":1.12345e-29,
    "22":1.12345e-30,
    "23":1.12345e-31,
    "24":1.12345e-32,
    "25":1.12345e-33,
    "26":1.12345e-34,
    "27":1.12345e-35,
    "28":1.12345e-36,
    "29":1.12345e-37,
    "30":1.12345e-38,
    "31":1.12345e-39
  }
]

Note: you will get value error if you use precision more than 15.

ValueError: Invalid value '20' for option 'double_precision', max is '15'

So, in a sense this is not same as json.dumps.

answered Oct 21 '22 15:10

Karn Kumar

pd.DataFrame.to_json uses internal library pandas._libs.json and not a standard json module. This explains the difference in behavior. The former "normalizes" the numbers internally and does not expose API to control it. As a result, you have following options:

Either convert to dictionary and dump using standard json library (as mentioned earlier):

>>> print(json.dumps(df.to_dict(orient='records'), indent=2))

[
  {
    "0": 1.12345e-08,
    "1": 1.12345e-09,
    "2": 1.12345e-10,
    "3": 1.12345e-11,
    "4": 1.12345e-12,
    "5": 1.12345e-13,
    "6": 1.12345e-14,
    "7": 1.12345e-15,
    "8": 1.12345e-16,
    "9": 1.12345e-17,
    "10": 1.12345e-18,
    "11": 1.12345e-19
  }
]

This is perfectly legitimate solution.

You can use CSV format instead of JSON and specify desired floating point format:

>>> print(df.to_csv(float_format='%.10e', index=False))

0,1,2,3,4,5,6,7,8,9,10,11
1.1234500000e-08,1.1234500000e-09,1.1234500000e-10,1.1234500000e-11,1.1234500000e-12,1.1234500000e-13,1.1234500000e-14,1.1234500000e-15,1.1234500000e-16,1.1234500000e-17,1.1234500000e-18,1.1234500000e-19

Another option is to convert values to string before the "normalization" kicks in:

>>> print(df.astype(str).to_json(indent=2, orient='index'))

{
  "0":{
    "0":"1.12345e-08",
    "1":"1.12345e-09",
    "2":"1.12345e-10",
    "3":"1.12345e-11",
    "4":"1.12345e-12",
    "5":"1.12345e-13",
    "6":"1.12345e-14",
    "7":"1.12345e-15",
    "8":"1.12345e-16",
    "9":"1.12345e-17",
    "10":"1.12345e-18",
    "11":"1.12345e-19"
  }
}

The conversion to string will require special attention when reading back the JSON.

And finally, if you need the exact values, just use binary format such as parquet or pickle.