Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pythonic pattern for building up parallel lists

I am new-ish to Python and I am finding that I am writing the same pattern of code over and over again:

def foo(list):
    results = []
    for n in list:
        #do some or a lot of processing on N and possibly other variables
        nprime = operation(n)
        results.append(nprime)
    return results

I am thinking in particular about the creation of the empty list followed by the append call. Is there a more Pythonic way to express this pattern? append might not have the best performance characteristics, but I am not sure how else I would approach it in Python.

I often know exactly the length of my output, so calling append each time seems like it might be causing memory fragmentation, or performance problems, but I am also wondering if that is just my old C ways tripping me up. I am writing a lot of text parsing code that isn't super performance sensitive on any particular loop or piece because all of the performance is really contained in gensim or NLTK code and is in much more capable hands than mine.

Is there a better/more pythonic pattern for doing this type of operation?

like image 349
David Avatar asked Dec 15 '22 02:12

David


2 Answers

First, a list comprehension may be all you need (if all the processing mentioned in your comment occurs in operation.

def foo(list):
    return [operation(n) for n in list]

If a list comprehension will not work in your situation, consider whether foo really needs to build the list and could be a generator instead.

def foo(list):
    for n in list:
        # Processing...
        yield operation(n)

In this case, you can iterate over the sequence, and each value is calculated on demand:

for x in foo(myList):
   ...

or you can let the caller decide if a full list is needed:

results = list(foo())

If neither of the above is suitable, then building up the return list in the body of the loop as you are now is perfectly reasonable.

like image 200
chepner Avatar answered Dec 29 '22 22:12

chepner


[..] so calling append each time seems like it might be causing memory fragmentation, or performance problems, but I am also wondering if that is just my old C ways tripping me up.

If you are worried about this, don't. Python over-allocates when a new resizing of the list is required (lists are dynamically resized based on their size) in order to perform O(1) appends. Either you manually call list.append or build it with a list comprehension (which internally also uses .append) the effect, memory wise, is similar.

The list-comprehension just performs (speed wise) a bit better; it is optimized for creating lists with specialized byte-code instructions that aid it (LIST_APPEND mainly that directly calls lists append in C).

Of course, if memory usage is of concern, you could always opt for the generator approach as highlighted in chepners answer to lazily produce your results.


In the end, for loops are still great. They might seem clunky in comparison to comprehensions and maps but they still offer a recognizable and readable way to achieve a goal. for loops deserve our love too.

like image 31
Dimitris Fasarakis Hilliard Avatar answered Dec 29 '22 22:12

Dimitris Fasarakis Hilliard