Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to define dtype on read_json?

I'm trying to read in the following JSON to a DataFrame:

[{"col1": 900000000000000000000}]

When I run pd.read_json('sample.json') I receive error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 366, in read_json
    return json_reader.read()
  File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 467, in read
    obj = self._get_object_parser(self.data)
  File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 484, in _get_object_parser
    obj = FrameParser(json, **kwargs).parse()
  File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 576, in parse
    self._parse_no_numpy()
  File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 793, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None)
ValueError: Value is too big

I've tried a few different ways to define the dtype on read, such as:

  • df = pd.read_json('sample.json', dtype={'col1': np.dtype('object')})
  • df = pd.read_json('sample.json', dtype={'col1': np.object})
  • df = pd.read_json('sample.json', dtype={'col1': str})

Interestingly, if I change my input to the following, it works just fine with the dtype set to float64: [{"col1": "900000000000000000000"}]; but that's not what my input will be, unfortunately.

Any idea on why I'm not able to properly define the dtype on read? Thanks.

like image 961
chris.mclennon Avatar asked Sep 14 '25 01:09

chris.mclennon


1 Answers

First, use json.loads and load in all the data that isn't problematic (in this case, everything besides col1).

import json

json_data = '''[{"col1": 900000000000000000000, "col2": "abc"}, {....}]'''
data = json.loads(json_data)

c = list(set(data[0].keys()) - {'col1'})
df = pd.DataFrame.from_records(data, columns=c)

Now, we'll have to manually extract col1's data, convert it to a dtype=object Series, and then add it.

df.insert(0, 'col1', pd.Series([d['col1'] for d in data], dtype=object))

df
                    col1 col2
0  900000000000000000000  abc
like image 65
cs95 Avatar answered Sep 16 '25 16:09

cs95