Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Opening A large JSON file

Tags:

python

json

nltk

I have a 1.7 GB JSON file when I am trying to open with json.load() then it is giving memory error, How could read the JSON file in python?

My JSON file is a big array of objects containing specific keys.

Edit: Well if it is just one big array of objects and it is known the structure of objects beforehand then there is no need to use tools we could read it line by line. A line will just contain one element of the array. I noticed that is the way json files are stored, for me it worked as just:

>>>for line in open('file.json','r').readline():
...    do something with(line) 
like image 547
Hirak Sarkar Avatar asked May 23 '12 07:05

Hirak Sarkar


2 Answers

You want an incremental json parser like yajl and one of its python bindings. An incremental parser reads as little as possible from the input and invokes a callback when something meaningful is decoded. For example, to pull only numbers from a big json file:

class ContentHandler(YajlContentHandler):
    def yajl_number(self, ctx, val):
         list_of_numbers.append(float(val))

parser = YajlParser(ContentHandler())
parser.parse(some_file)

See http://pykler.github.com/yajl-py/ for more info.

like image 182
georg Avatar answered Oct 03 '22 00:10

georg


I have found another python wrapper around yajl library, which is ijson.

It works better for me than yajl-py due to the following reasons:

  • yajl-py did not detect yajl library on my system, I had to hack the code in order to make it work
  • ijson code is more compact and easier to use
  • ijson can work with both yajl v1 and yajl v2, and it even has pure python yajl replacement
  • ijson has very nice ObjectBuilder, which helps extracting not just events but meaningful sub-objects from parsed stream, and at the level you specify
like image 22
Yaroslav Stavnichiy Avatar answered Oct 03 '22 00:10

Yaroslav Stavnichiy