Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write a memory efficient Python program?

It's said that Python automatically manages memory. I'm confused because I have a Python program consistently uses more than 2GB of memory.

It's a simple multi-thread binary data downloader and unpacker.

def GetData(url):
    req = urllib2.Request(url)
    response = urllib2.urlopen(req)
    data = response.read() // data size is about 15MB
    response.close()
    count = struct.unpack("!I", data[:4])
    for i in range(0, count):
        UNPACK FIXED LENGTH OF BINARY DATA HERE
        yield (field1, field2, field3)

class MyThread(threading.Thread):
    def __init__(self, total, daterange, tickers):
        threading.Thread.__init__(self)

    def stop(self):
        self._Thread__stop()

    def run(self):
        GET URL FOR EACH REQUEST
        data = []
        items = GetData(url)
        for item in items:
            data.append(';'.join(item))
        f = open(filename, 'w')
        f.write(os.linesep.join(data))
        f.close()

There are 15 threads running. Each request gets 15MB of data and unpack it and saved to local text file. How could this program consume more than 2GB of memory? Do I need to do any memory recycling jobs in this case? How can I see how much memory each objects or functions use?

I would appreciate all your advices or tips on how to keep a python program running in a memory efficient mode.

Edit: Here is the output of "cat /proc/meminfo"

MemTotal:        7975216 kB
MemFree:          732368 kB
Buffers:           38032 kB
Cached:          4365664 kB
SwapCached:        14016 kB
Active:          2182264 kB
Inactive:        4836612 kB
like image 876
jack Avatar asked Nov 02 '09 06:11

jack


People also ask

How do you make a memory efficient code in Python?

Use join() instead of '+' to concatenate string As strings are immutable, every time you add an element to a string by the “+” operator, a new string will be allocated in memory space. The longer the string, the more memory consumed, the less efficient the code becomes.

Is Python high memory efficient?

Python optimizes memory utilization by allocating the same object reference to a new variable if the object already exists with the same value. That is why python is called more memory efficient.

Why Python is not memory efficient?

Due to its simplicity, however, Python does not provide you much freedom in managing memory usage, unlike in languages like C++ where you can manually allocate and free memory. However, having a good understanding of Python memory management is a great start that will enable you to write more efficient code.

How much RAM do I need for Python coding?

4GB might do, but that is on the very low end of the scale and you will see that most of it will probably be used throughout the day. Today most desktops are equipped with 8GB-64GB of RAM — which of course will have no problem with Python.


2 Answers

The major culprit here is as mentioned above the range() call. It will create a list with 15 million members, and that will eat up 200 MB of your memory, and with 15 processes, that's 3GB.

But also don't read in the whole 15MB file into data(), read bit by bit from the response. Sticking those 15MB into a variable will use up 15MB of memory more than reading bit by bit from the response.

You might want to consider simply just extracting data until you run out if indata, and comparing the count of data you extracted with what the first bytes said it should be. Then you need neither range() nor xrange(). Seems more pythonic to me. :)

like image 200
Lennart Regebro Avatar answered Sep 29 '22 20:09

Lennart Regebro


Like others have said, you need at least the following two changes:

  1. Do not create a huge list of integers with range

    # use xrange
    for i in xrange(0, count):
        # UNPACK FIXED LENGTH OF BINARY DATA HERE
        yield (field1, field2, field3)
    
  2. do not create a huge string as the full file body to be written at once

    # use writelines
    f = open(filename, 'w')
    f.writelines((datum + os.linesep) for datum in data)
    f.close()
    

Even better, you could write the file as:

    items = GetData(url)
    f = open(filename, 'w')
    for item in items:
        f.write(';'.join(item) + os.linesep)
    f.close()
like image 35
tzot Avatar answered Sep 29 '22 19:09

tzot