Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing incomplete json array

I have downloaded 5MB of a very large json file. From this, I need to be able to load that 5MB to generate a preview of the json file. However, the file will probably be incomplete. Here's an example of what it may look like:

[{
    "first": "bob",
    "address": {
        "street": 13301,
        "zip": 1920
    }
}, {
    "first": "sarah",
    "address": {
        "street": 13301,
        "zip": 1920
    }
}, {"first" : "tom"

From here, I'd like to "rebuild it" so that it can parse the first two objects (and ignore the third).

Is there a json parser that can infer or cut off the end of the string to make it parsable? Or perhaps to 'stream' the parsing of the json array, so that when it fails on the last object, I can exit the loop? If not, how could the above be accomplished?

like image 372
David542 Avatar asked Dec 26 '18 21:12

David542


People also ask

Can a JSON array be empty?

JSON data has the concept of null and empty arrays and objects.

What happens if JSON parse fails?

Copied! We call the JSON. parse method inside of a try/catch block. If passed an invalid JSON value, the method will throw an error, which will get passed to the catch() function.

What error does JSON parse () throw when the string to parse is not valid JSON?

JSON. parse() is a built-in method in JavaScript which is used to parse a JSON string and convert it into a JavaScript object. If the JSON string is invalid, it will throw a SyntaxError.


1 Answers

If your data will always look somewhat similar, you could do something like this:

import json

json_string = """[{
    "first": "bob",
    "address": {
        "street": 13301,
        "zip": 1920
    }
}, {
    "first": "sarah",
    "address": {
        "street": 13301,
        "zip": 1920
    }
}, {"first" : "tom"
"""

while True:
    if not json_string:
        raise ValueError("Couldn't fix JSON")
    try:
        data = json.loads(json_string + "]")
    except json.decoder.JSONDecodeError:
        json_string = json_string[:-1]
        continue
    break

print(data)

This assumes that the data is a list of dicts. Step by step, the last character is removed and a missing ] appended. If the new string can be interpreted as JSON, the infinite loop breaks. Otherwise the next character is removed and so on. If there are no characters left ValueError("Couldn't fix JSON") is raised.

For the above example, it prints:

[{'first': 'bob', 'address': {'zip': 1920, 'street': 13301}}, {'first': 'sarah', 'address': {'zip': 1920, 'street': 13301}}]
like image 84
finefoot Avatar answered Nov 09 '22 21:11

finefoot