Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse large JSON file in Python

Tags:

python

json

I'm trying to parse a really large JSON file in Python. The file has 6523440 lines but is broken into a lot of JSON objects.

The structure looks like this:

[
  {
    "projects": [
     ...
    ]
  }
]
[
  {
    "projects": [
     ...
    ]
  }
]
....
....
....

and it goes on and on...

Every time I try to load it using json.load() I get an error

ValueError: Extra data: line 2247 column 1 - line 6523440 column 1 (char 101207 - 295464118)

On the line where the first object ends and the second one starts. Is there a way to load them separately or anything similar?

like image 752
Luka Avatar asked Jun 30 '26 00:06

Luka


1 Answers

You can try using a streaming json library like ijson:

Sometimes when dealing with a particularly large JSON payload it may worth to not even construct individual Python objects and react on individual events immediately producing some result

like image 194
shyam Avatar answered Jul 02 '26 13:07

shyam



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!