Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle a huge stream of JSON dictionaries?

Tags:

python

json

I have a file that contains a stream of JSON dictionaries like this:

{"menu": "a"}{"c": []}{"d": [3, 2]}{"e": "}"}

It also includes nested dictionaries and it looks like I cannot rely on a newline being a separator. I need a parser that could be used like this:

for d in getobjects(f):
  handle_dict(d)

The point is that it would be perfect if the iteration only happened at the root level. Is there a Python parser that would handle all JSON's quirks? I am interested in a solution that would work on files that wouldn't fit into RAM.

like image 713
d33tah Avatar asked Jun 12 '15 17:06

d33tah


People also ask

How do you handle large JSON data?

Instead of reading the whole file at once, the 'chunksize' parameter will generate a reader that gets a specific number of lines to be read every single time and according to the length of your file, a certain amount of chunks will be created and pushed into memory; for example, if your file has 100.000 lines and you ...

Can you store dictionaries in JSON?

You can save the Python dictionary into JSON files using a built-in module json. We need to use json. dump() method to do this. Use the indent parameter to prettyPrint your JSON data.

What is the difference between JSON dump and JSON dumps?

dump() method used to write Python serialized object as JSON formatted data into a file. json. dumps() method is used to encodes any Python object into JSON formatted String.


1 Answers

I think JSONDecoder.raw_decode may be what you're looking for. You may have to do some string formatting to get it in the perfect format depending on newlines and such, but with a bit of work, you'll probably be able to get something working. See this example.

import json
jstring = '{"menu": "a"}{"c": []}{"d": [3, 2]}{"e": "}"}'
substr = jstring
decoder = json.JSONDecoder()

while len(substr) > 0:
    data,index = decoder.raw_decode(substr)
    print data
    substr = substr[index:]

Gives output:

{u'menu': u'a'}
{u'c': []}
{u'd': [3, 2]}
{u'e': u'}'}
like image 68
Brien Avatar answered Nov 11 '22 06:11

Brien