So here is the standard way to read in a JSON file in python
import json
from pprint import pprint
with open('ig001.json') as data_file:
data = json.load(data_file)
pprint(data)
However, my JSON file that I want to read has multiple JSON objects in it. So it looks something like:
[{},{}.... ]
[{},{}.... ]
Where this represents 2 JSON objects, and inside each object inside each {}, there are a bunch of key:value pairs.
So when I try to read this using the standard read code that I have above, I get the error:
Traceback (most recent call last): File "jsonformatter.py", line 5, in data = json.load(data_file) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.py", line 290, in load **kw) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.py", line 338, in loads return _default_decoder.decode(s) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode raise ValueError(errmsg("Extra data", s, end, len(s))) ValueError: Extra data: line 3889 column 2 - line 719307 column 2 (char 164691 - 30776399)
Where line 3889 is where the first JSON object ends and the next one begins, the line itself looks like "][".
Any ideas on how to fix this would be appreciated, thanks!
Without a link your JSON file, I'm going to have to make some assumptions:
To fix this:
# 1. replace instances of `][` with `]<SPLIT>[`
# (`<SPLIT>` needs to be something that is not present anywhere in the file to begin with)
raw_data = data_file.read() # we're going to need the entire file in memory
tweaked_data = raw_data.replace('][', ']<SPLIT>[')
# 2. split the string into an array of strings, using the chosen split indicator
split_data = tweaked_data.split('<SPLIT>')
# 3. load each string individually
parsed_data = [json.loads(bit_of_data) for bit_of_data in split_data]
(pardon the horrible variable names)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With