Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MemoryError using json.dumps()

I would like to know which one of json.dump() or json.dumps() are the most efficient when it comes to encoding a large array to json format.

Can you please show me an example of using json.dump()?

Actually I am making a Python CGI that gets large amount of data from a MySQL database using the ORM SQlAlchemy, and after some user triggered processing, I store the final output in an Array that I finally convert to Json.

But when converting to JSON with :

 print json.dumps({'success': True, 'data': data}) #data is my array

I get the following error:

Traceback (most recent call last):
  File "C:/script/cgi/translate_parameters.py", line 617, in     <module>
f.write(json.dumps(mytab,default=dthandler,indent=4))
  File "C:\Python27\lib\json\__init__.py", line 250, in dumps
    sort_keys=sort_keys, **kw).encode(obj)
  File "C:\Python27\lib\json\encoder.py", line 209, in encode
    chunks = list(chunks)
MemoryError

So, my guess is using json.dump() to convert data by chunks. Any ideas on how to do this?

Or other ideas besides using json.dump()?

like image 801
salamey Avatar asked Jun 16 '14 08:06

salamey


2 Answers

You can simply replace

f.write(json.dumps(mytab,default=dthandler,indent=4))

by

json.dump(mytab, f, default=dthandler, indent=4)

This should "stream" the data into the file.

like image 53
sebastian Avatar answered Sep 22 '22 10:09

sebastian


The JSON module will allocate the entire JSON string in memory before writing, which is why MemoryError occurs.

To get around this problem, use JSON.Encoder().iterencode():

with open(filepath, 'w') as f:
    for chunk in json.JSONEncoder().iterencode(object_to_encode):
        f.write(chunk)

However note that this will generally take quite a while, since it is writing in many small chunks and not everything at once.


Special case:

I had a Python object which is a list of dicts. Like such:

[
    { "prop": 1, "attr": 2 },
    { "prop": 3, "attr": 4 }
    # ...
]

I could JSON.dumps() individual objects, but the dumping whole list generates a MemoryError To speed up writing, I opened the file and wrote the JSON delimiter manually:

with open(filepath, 'w') as f:
    f.write('[')

    for obj in list_of_dicts[:-1]:
        json.dump(obj, f)
        f.write(',')

    json.dump(list_of_dicts[-1], f)
    f.write(']')

You can probably get away with something like that if you know your JSON object structure beforehand. For a general use, just use JSON.Encoder().iterencode().

like image 34
Xavier Ho Avatar answered Sep 24 '22 10:09

Xavier Ho