Python: Memory usage and optimization when modifying lists

The problem

My concern is the following: I am storing a relativity large dataset in a classical python list and in order to process the data I must iterate over the list several times, perform some operations on the elements, and often pop an item out of the list.

It seems that deleting one item out of a Python list costs O(N) since Python has to copy all the items above the element at hand down one place. Furthermore, since the number of items to delete is approximately proportional to the number of elements in the list this results in an O(N^2) algorithm.

I am hoping to find a solution that is cost effective (time and memory-wise). I have studied what I could find on the internet and have summarized my different options below. Which one is the best candidate ?

Keeping a local index:

while processingdata:
    index = 0
    while index < len(somelist):
        item = somelist[index]
        dosomestuff(item)
        if somecondition(item):
            del somelist[index]
        else:
            index += 1

This is the original solution I came up with. Not only is this not very elegant, but I am hoping there is better way to do it that remains time and memory efficient.

Walking the list backwards:

while processingdata:
    for i in xrange(len(somelist) - 1, -1, -1):
        dosomestuff(item)
        if somecondition(somelist, i):
            somelist.pop(i)

This avoids incrementing an index variable but ultimately has the same cost as the original version. It also breaks the logic of dosomestuff(item) that wishes to process them in the same order as they appear in the original list.

Making a new list:

while processingdata:
    for i, item in enumerate(somelist):
        dosomestuff(item)
    newlist = []
    for item in somelist:
        if somecondition(item):
            newlist.append(item)
    somelist = newlist
    gc.collect()

This is a very naive strategy for eliminating elements from a list and requires lots of memory since an almost full copy of the list must be made.

Using list comprehensions:

while processingdata:
    for i, item in enumerate(somelist):
        dosomestuff(item)
    somelist[:] = [x for x in somelist if somecondition(x)]

This is very elegant but under-the-cover it walks the whole list one more time and must copy most of the elements in it. My intuition is that this operation probably costs more than the original del statement at least memory wise. Keep in mind that somelist can be huge and that any solution that will iterate through it only once per run will probably always win.

Using the filter function:

while processingdata:
    for i, item in enumerate(somelist):
        dosomestuff(item)
    somelist = filter(lambda x: not subtle_condition(x), somelist)

This also creates a new list occupying lots of RAM.

Using the itertools' filter function:

from itertools import ifilterfalse
while processingdata:
     for item in itertools.ifilterfalse(somecondtion, somelist):
         dosomestuff(item)

This version of the filter call does not create a new list but will not call dosomestuff on every item breaking the logic of the algorithm. I am including this example only for the purpose of creating an exhaustive list.

Moving items up the list while walking

while processingdata:
    index = 0
    for item in somelist:
        dosomestuff(item)
        if not somecondition(item):
            somelist[index] = item
            index += 1
    del somelist[index:]

This is a subtle method that seems cost effective. I think it will move each item (or the pointer to each item ?) exactly once resulting in an O(N) algorithm. Finally, I hope Python will be intelligent enough to resize the list at the end without allocating memory for a new copy of the list. Not sure though.

Abandoning Python lists:

class Doubly_Linked_List:
    def __init__(self):
        self.first = None
        self.last = None
        self.n = 0
    def __len__(self):
        return self.n
    def __iter__(self):
        return DLLIter(self)
    def iterator(self):
        return self.__iter__()
    def append(self, x):
        x = DLLElement(x)
        x.next = None
        if self.last is None:
            x.prev = None
            self.last = x
            self.first = x
            self.n = 1
        else:
            x.prev = self.last
            x.prev.next = x
            self.last = x
            self.n += 1

class DLLElement:
    def __init__(self, x):
    self.next = None
    self.data = x
    self.prev = None

class DLLIter:
    etc...

This type of object resembles a python list in a limited way. However, deletion of an element is guaranteed O(1). I would not like to go here since this would require massive amounts of code refactoring almost everywhere.

276

asked Apr 13 '10 15:04

xApple

2 Answers

Without knowing the specifics of what you're doing with this list, it's hard to know exactly what would be best in this case. If your processing stage depends on the current index of the list element, this won't work, but if not, it appears you've left off the most Pythonic (and in many ways, easiest) approach: generators.

If all you're doing is iterating over each element, processing it in some way, then either including that element in the list or not, use a generator. Then you never need to store the entire iterable in memory.

def process_and_generate_data(source_iterable):
    for item in source_iterable:
        dosomestuff(item)
        if not somecondition(item):
            yield item

You would need to have a processing loop that dealt with persisting the processed iterable (writing it back to a file, or whatever), or if you have multiple processing stages you'd prefer to separate into different generators you could have your processing loop pass one generator to the next.

154

answered Oct 01 '22 05:10

Jeffrey Harris

From your description it sounds like a deque ("deck") would be exactly what you are looking for:

http://docs.python.org/library/collections.html#deque-objects

"Iterate" across it by repeatedly calling pop() and then, if you want to keep the popped item in the deque, returning that item to the front with appendleft(item). To keep up with when you're done iterating and have seen everything in the deque, either put in a marker object like None that you watch for, or just ask for the deque's len() when you start a particular loop and use range() to pop() exactly that many items.

I believe you will find all of the operations you need are then O(1).

answered Oct 01 '22 04:10

Brandon Rhodes

Related questions
                            
                                How to resize NSTextView according to its content?
                            
                                Are events raised, triggered, or fired?
                            
                                Using Moq to set indexers in C#
                            
                                Thread-safe initialization of function-local static const objects
                            
                                Do I have to initialize simple class member variables? [duplicate]
                            
                                Enable/Disable USB or Wifi tethering programmatically on android
                            
                                What is the origin of the name 'dcommit' in git svn dcommit?
                            
                                Is there a css equivalent to 'align="center"'?
                            
                                Java blocking issue: Why would JVM block threads in many different classes/methods?
                            
                                Setting maximum CPU usage per process in Windows [closed]
                            
                                Line rasterisation: Cover all pixels, regardless of line gradient?
                            
                                RSA: Private key calculation with Extended Euclidean Algorithm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With