Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python json.loads shows ValueError: Extra data

Tags:

python

json

People also ask

How do I fix raise JSONDecodeError extra data's end?

JSONDecodeError: Extra data" occurs when we try to parse multiple objects without wrapping them in an array. To solve the error, wrap the JSON objects in an array or declare a new property that points to an array value that contains the objects.

How do I read a JSON file in Python?

Reading From JSON It's pretty easy to load a JSON object in Python. Python has a built-in package called json, which can be used to work with JSON data. It's done by using the JSON module, which provides us with a lot of methods which among loads() and load() methods are gonna help us to read the JSON file.

How do you read a JSON file line by line in Python?

Python read JSON file line by lineStep 1: import json module. Step 3: Read the json file using open() and store the information in file variable. Step 4: Convert item from json to python using load() & store the information in db variable. Step 5: append db in lineByLine empty list.


As you can see in the following example, json.loads (and json.load) does not decode multiple json object.

>>> json.loads('{}')
{}
>>> json.loads('{}{}') # == json.loads(json.dumps({}) + json.dumps({}))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\json\__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\lib\json\decoder.py", line 368, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 3 - line 1 column 5 (char 2 - 4)

If you want to dump multiple dictionaries, wrap them in a list, dump the list (instead of dumping dictionaries multiple times)

>>> dict1 = {}
>>> dict2 = {}
>>> json.dumps([dict1, dict2])
'[{}, {}]'
>>> json.loads(json.dumps([dict1, dict2]))
[{}, {}]

You can just read from a file, jsonifying each line as you go:

tweets = []
for line in open('tweets.json', 'r'):
    tweets.append(json.loads(line))

This avoids storing intermediate python objects. As long as your write one full tweet per append() call, this should work.


I came across this because I was trying to load a JSON file dumped from MongoDB. It was giving me an error

JSONDecodeError: Extra data: line 2 column 1

The MongoDB JSON dump has one object per line, so what worked for me is:

import json
data = [json.loads(line) for line in open('data.json', 'r')]

This may also happen if your JSON file is not just 1 JSON record. A JSON record looks like this:

[{"some data": value, "next key": "another value"}]

It opens and closes with a bracket [ ], within the brackets are the braces { }. There can be many pairs of braces, but it all ends with a close bracket ]. If your json file contains more than one of those:

[{"some data": value, "next key": "another value"}]
[{"2nd record data": value, "2nd record key": "another value"}]

then loads() will fail.

I verified this with my own file that was failing.

import json

guestFile = open("1_guests.json",'r')
guestData = guestFile.read()
guestFile.close()
gdfJson = json.loads(guestData)

This works because 1_guests.json has one record []. The original file I was using all_guests.json had 6 records separated by newline. I deleted 5 records, (which I already checked to be bookended by brackets) and saved the file under a new name. Then the loads statement worked.

Error was

   raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 10 column 1 (char 261900 - 6964758)

PS. I use the word record, but that's not the official name. Also, if your file has newline characters like mine, you can loop through it to loads() one record at a time into a json variable.


I just got the same error while my json file is like this

{"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"}
{"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}

And I found it malformed, so I changed it to:

{
  "datas":[
    {"id":"1101010","city_id":"1101","name":"TEUPAH SELATAN"},
    {"id":"1101020","city_id":"1101","name":"SIMEULUE TIMUR"}
  ]
}

One-liner for your problem:

data = [json.loads(line) for line in open('tweets.json', 'r')]