I'm trying to parse a really large JSON file in Python. The file has 6523440 lines but is broken into a lot of JSON objects.
The structure looks like this:
[
{
"projects": [
...
]
}
]
[
{
"projects": [
...
]
}
]
....
....
....
and it goes on and on...
Every time I try to load it using json.load() I get an error
ValueError: Extra data: line 2247 column 1 - line 6523440 column 1 (char 101207 - 295464118)
On the line where the first object ends and the second one starts. Is there a way to load them separately or anything similar?
You can try using a streaming json library like ijson:
Sometimes when dealing with a particularly large JSON payload it may worth to not even construct individual Python objects and react on individual events immediately producing some result
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With