How to write a memory efficient Python program?

Tags:

It's said that Python automatically manages memory. I'm confused because I have a Python program consistently uses more than 2GB of memory.

It's a simple multi-thread binary data downloader and unpacker.

def GetData(url):
    req = urllib2.Request(url)
    response = urllib2.urlopen(req)
    data = response.read() // data size is about 15MB
    response.close()
    count = struct.unpack("!I", data[:4])
    for i in range(0, count):
        UNPACK FIXED LENGTH OF BINARY DATA HERE
        yield (field1, field2, field3)

class MyThread(threading.Thread):
    def __init__(self, total, daterange, tickers):
        threading.Thread.__init__(self)

    def stop(self):
        self._Thread__stop()

    def run(self):
        GET URL FOR EACH REQUEST
        data = []
        items = GetData(url)
        for item in items:
            data.append(';'.join(item))
        f = open(filename, 'w')
        f.write(os.linesep.join(data))
        f.close()

There are 15 threads running. Each request gets 15MB of data and unpack it and saved to local text file. How could this program consume more than 2GB of memory? Do I need to do any memory recycling jobs in this case? How can I see how much memory each objects or functions use?

I would appreciate all your advices or tips on how to keep a python program running in a memory efficient mode.

Edit: Here is the output of "cat /proc/meminfo"

MemTotal:        7975216 kB
MemFree:          732368 kB
Buffers:           38032 kB
Cached:          4365664 kB
SwapCached:        14016 kB
Active:          2182264 kB
Inactive:        4836612 kB

876

asked Nov 02 '09 06:11

jack

2 Answers

The major culprit here is as mentioned above the range() call. It will create a list with 15 million members, and that will eat up 200 MB of your memory, and with 15 processes, that's 3GB.

But also don't read in the whole 15MB file into data(), read bit by bit from the response. Sticking those 15MB into a variable will use up 15MB of memory more than reading bit by bit from the response.

You might want to consider simply just extracting data until you run out if indata, and comparing the count of data you extracted with what the first bytes said it should be. Then you need neither range() nor xrange(). Seems more pythonic to me. :)

200

answered Sep 29 '22 20:09

Lennart Regebro

Like others have said, you need at least the following two changes:

Do not create a huge list of integers with range

# use xrange
for i in xrange(0, count):
    # UNPACK FIXED LENGTH OF BINARY DATA HERE
    yield (field1, field2, field3)

do not create a huge string as the full file body to be written at once

# use writelines
f = open(filename, 'w')
f.writelines((datum + os.linesep) for datum in data)
f.close()

Even better, you could write the file as:

    items = GetData(url)
    f = open(filename, 'w')
    for item in items:
        f.write(';'.join(item) + os.linesep)
    f.close()

answered Sep 29 '22 19:09

tzot

Related questions
                            
                                How to whiten matrix in PCA
                            
                                Python, writing multi line code in IDLE
                            
                                for - else vs for elif
                            
                                Replace the last element in a list with another list
                            
                                Most pythonic way to get the previous element
                            
                                Test if File/Dir exists over SSH/Sudo in Python/Bash [duplicate]
                            
                                matplotlib bar3d clipping problems
                            
                                _fastmath error in python: HAVE_DECL_MPZ_POWM_SEC
                            
                                Add cylinder to plot
                            
                                Sum slices of consecutive values in a NumPy array
                            
                                How to set default values with methods in Odoo?
                            
                                How to return back a list instead of tuple in psycopg2
                            
                                ImportError: No module named flask.ext.sqlalchemy in virtualenv
                            
                                How to check if object is not None within a list comprehension?
                            
                                How to "Merge" Sequential models in Keras 2.0?
                            
                                Python/Django: How to display error messages on invalid login?
                            
                                Docker Alpine linux running 2 programs
                            
                                What is the difference between tf.keras.layers versus tf.layers?
                            
                                How to use pep8 module using Spyder
                            
                                SMTPAuthenticationError 5.7.14 Please log\n5.7.14 in via your web browser

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to write a memory efficient Python program?

Tags:

python

memory-management

memory

jack

People also ask

2 Answers

Lennart Regebro

tzot

Recent Activity

Donate For Us