memory use in large data-structures manipulation/processing

Question

I have a number of large (~100 Mb) files which I'm regularly processing. While I'm trying to delete unneeded data structures during processing, memory consumption is a bit too high. I was wondering if there is a way to efficiently manipulate large data, e.g.:

def read(self, filename):
    fc = read_100_mb_file(filename)
    self.process(fc)
def process(self, content):
    # do some processing of file content

Is there a duplication of data structures? Isn't it more memory efficient to use a class-wide attribute like self.fc?

When should I use garbage collection? I know about the gc module, but do I call it after I del fc for example?

update
p.s. 100 Mb is not a problem in itself. but float conversion, further processing add significantly more to both working set and virtual size (I'm on Windows).

Ryan Ginstrom · Accepted Answer

I'd suggest looking at the presentation by David Beazley on using generators in Python. This technique allows you to handle a lot of data, and do complex processing, quickly and without blowing up your memory use. IMO, the trick isn't holding a huge amount of data in memory as efficiently as possible; the trick is avoiding loading a huge amount of data into memory at the same time.

Crashworks · Answer

Before you start tearing your hair out over the garbage collector, you might be able to avoid that 100mb hit of loading the entire file into memory by using a memory-mapped file object. See the mmap module.

memory use in large data-structures manipulation/processing

Tags:

python

memory-leaks

data-structures

garbage-collection

SilentGhost

2 Answers

Ryan Ginstrom

Crashworks

Recent Activity

Donate For Us

memory use in large data-structures manipulation/processing

Tags:

python

memory-leaks

data-structures

garbage-collection

SilentGhost

2 Answers

Ryan Ginstrom

Crashworks

Related questions

Recent Activity

Donate For Us