Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Load a Single Large Python Dictionary Encoded as Json Without Killing Memory Usage?

Tags:

python

json

I've seen a lot of similar questions to this, but nothing that really matched. Most other questions seemed to relate to speed. What I'm experiencing is a single json dictionary that sits in a 1.1gig file on my local box taking up all of my 16 gigabytes of memory when I try to load it using anything along the lines of:

f = open(some_file, "rb")
new_dictionary = json.load(f)

This happens regardless of what json library I use (I've tried ujson, json, yajl), and regardless of whether I read things in as a byte stream or not. This makes absolutely no sense to me. What's with the crazy memory usage, and how do I get around it?

In case it helps, the dictionary is just a bunch of nested dictionaries all having ints point to other ints. A sample looks like:

{"0":{"3":82,"4":503,"15":456},"956":{"56":823,"678":50673,"35":1232}...}

UPDATE: When I run this with simplejson, it actually only takes up 8 gigs. No idea why that one takes up so much less than all the others.

UPDATE 2: So I did some more investigation. I loaded up my dictionary with simplejson, and tried converting all the keys to ints (per Liori's suggestion that strings might take up more space). Space stayed the same at 8 gigs. Then I tried Winston Ewert's suggestion of running a gc.collect(). Space still remained at 8 gigs. Finally, annoyed and curious, I pickled my new data structure, exited Python, and reloaded. Lo and behold, it still takes up 8 gigs. I guess Python just wants that much space for a big 2d dictionary. Frustrating, for sure, but at least now I know it's not a JSON problem so long as I use simplejson to load it.

like image 979
Eli Avatar asked May 04 '12 21:05

Eli


People also ask

Can I save a Python dictionary as JSON?

You can save the Python dictionary into JSON files using a built-in module json. We need to use json. dump() method to do this. Use the indent parameter to prettyPrint your JSON data.

How do you convert a Python dictionary to JSON?

To convert a Dict to JSON in Python, you can use json. dumps() function. json. dumps() function converts the Dictionary object into JSON string.

How do I read a 1gb JSON file in Python?

Method 1: Using json. load() to read a JSON file in Python load() . We can construct a Python object after we read a JSON file in Python directly, using this method. We can load the json objects into a Python object using the below program. We can now easily access it using {key: value} pair mappings of a dictionary!


2 Answers

You could try with a streaming API:

http://lloyd.github.com/yajl/

of which there are a couple of python wrappers.

https://github.com/rtyler/py-yajl/

https://github.com/pykler/yajl-py

like image 167
Marco Mariani Avatar answered Sep 28 '22 05:09

Marco Mariani


A little experimentation on my part suggests that calling gc.collect() after the json object has been parsed drops memory usage to where it was when the object was originally constructed.

Here is the results I get for memory usage on a smaller scale:

Build. No GC
762912
Build. GC
763000
Standard Json. Unicode Keys. No GC
885216
Standard Json. Unicode Keys. GC
744552
Standard Json. Int Keys. No GC
885216
Standard Json. Int Keys. GC
744724
Simple Json. Unicode Keys. No GC
894352
Simple Json. Unicode Keys. GC
745520
Simple Json. Int Keys. No GC
894352
Simple Json. Int Keys. GC
744884

Basically, running gc.collect() appears to cleanup some sort of garbage producing during the JSON parsing process.

like image 44
Winston Ewert Avatar answered Sep 28 '22 07:09

Winston Ewert