Huge memory usage of Python's json module?

Tags:

When I load the file into json, pythons memory usage spikes to about 1.8GB and I can't seem to get that memory to be released. I put together a test case that's very simple:

with open("test_file.json", 'r') as f:
    j = json.load(f)

I'm sorry that I can't provide a sample json file, my test file has a lot of sensitive information, but for context, I'm dealing with a file in the order of 240MB. After running the above 2 lines I have the previously mentioned 1.8GB of memory in use. If I then do del j memory usage doesn't drop at all. If I follow that with a gc.collect() it still doesn't drop. I even tried unloading the json module and running another gc.collect.

I'm trying to run some memory profiling but heapy has been churning 100% CPU for about an hour now and has yet to produce any output.

Does anyone have any ideas? I've also tried the above using cjson rather than the packaged json module. cjson used about 30% less memory but otherwise displayed exactly the same issues.

I'm running Python 2.7.2 on Ubuntu server 11.10.

I'm happy to load up any memory profiler and see if it does better then heapy and provide any diagnostics you might think are necessary. I'm hunting around for a large test json file that I can provide for anyone else to give it a go.

585

asked Jun 15 '12 20:06

Endophage

1 Answers

I think these two links address some interesting points about this not necessarily being a json issue, but rather just a "large object" issue and how memory works with python vs the operating system

See Why doesn't Python release the memory when I delete a large object? for why memory released from python is not necessarily reflected by the operating system:

If you create a large object and delete it again, Python has probably released the memory, but the memory allocators involved don’t necessarily return the memory to the operating system, so it may look as if the Python process uses a lot more virtual memory than it actually uses.

About running large object processes in a subprocess to let the OS deal with cleaning up:

The only really reliable way to ensure that a large but temporary use of memory DOES return all resources to the system when it's done, is to have that use happen in a subprocess, which does the memory-hungry work then terminates. Under such conditions, the operating system WILL do its job, and gladly recycle all the resources the subprocess may have gobbled up. Fortunately, the multiprocessing module makes this kind of operation (which used to be rather a pain) not too bad in modern versions of Python.

answered Oct 16 '22 19:10

jdi

Related questions
                            
                                Is there any way to access nested or re-raised exceptions in python?
                            
                                Scrapy 's Scrapyd too slow with scheduling spiders
                            
                                beginner installing nosetests package
                            
                                Reading lines beyond SUB in Python [duplicate]
                            
                                Tkinter changing the select background color on an unfocused Text widget
                            
                                Python cdecimal InvalidOperation
                            
                                Trying to understand linking procedure for writing Python/C++ hybrid
                            
                                What __future__ features should I import in Python v2.6.2?
                            
                                Python itertools.product reorder the generation
                            
                                Import excel data into models via django admin
                            
                                Reading output with telnetlib in realtime
                            
                                How to Get the Path of the executing frozen script
                            
                                How Do I Properly Declare a ctype Structure + Union in Python?
                            
                                python django - no module psycopg2.extension even after installing compiled version psycopg2-2.4.5.win32-py2.7.‌exe
                            
                                average numpy array but retain shape
                            
                                get packet size in scapy / python
                            
                                How to create child window and communicate with parent in TkInter
                            
                                How to modify 2d Scatterplot to display color based off third array in csv file?
                            
                                Tkinter Canvas move item to top level
                            
                                How can I change drives using python os?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Huge memory usage of Python's json module?

Tags:

python

json

memory-leaks

Endophage

People also ask

1 Answers

jdi

Recent Activity

Donate For Us