Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List Comprehensions in Python : efficient selection in a list

Let's suppose that I have a list of elements, and I want to select only some of them, according to a certain function (for example a distance to an other element).

I want to have as a result a list of tuple, with the distance and the element. So, I wrote the following code

result = [ ( myFunction(C), C) for C in originalList if myFunction(C) < limit ]

But myFunction is a very time-consuming function, and the originalList quite big. So doing like that, myFunction will be call twice for every selected element.

So, is there a way to avoid this ??

I have two other possibilities, but they are not so good:

  1. the first one, is to create the unfiltered list

    unfiltered = [ (myFunction(C),C) for C in originalList ]
    

    and then sort it

    result = [ (dist,C) for dist,C in unfiltered if dist < limit ]
    

    but in that case, I duplicate my originalList and waste some memory (the list could be quite big - more than 10,000 elements)

  2. the second one is tricky and not very pythonic, but efficient (the best we can do, since the function should be evaluated once per element). myFunction stores it last
    result in a global variable (lastResult for example), and this value is re-used in the List comprehension

    result = [ (lastResult,C) for C in originalList if myFunction(C) < limit ]
    

Do you have any better idea to achieve that, in an efficient and pythonic way ??

Thanks for your answers.

like image 507
ThibThib Avatar asked Aug 03 '09 14:08

ThibThib


2 Answers

Some options:

  • Use memoization
  • Use a normal for loop
  • Create an unfiltered list, then filter it (your option 1). The 'wasted' memory will be reclaimed by the GC very quickly - it's not something you need to worry about.
like image 99
JoeG Avatar answered Oct 21 '22 12:10

JoeG


Sure, the difference between the following two:

[f(x) for x in list]

and this:

(f(x) for x in list)

is that the first will generate the list in memory, whereas the second is a new generator, with lazy evaluation.

So, simply write the "unfiltered" list as a generator instead. Here's your code, with the generator inline:

def myFunction(x):
    print("called for: " + str(x))
    return x * x

originalList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
limit = 10
result =   [C2 for C2 in ((myFunction(C), C) for C in originalList) if C2[0] < limit]
# result = [C2 for C2 in [(myFunction(C), C) for C in originalList] if C2[0] < limit]

Note that you will not see a difference in the printout from the two, but if you were to look at memory usage, the second statement which is commented out, will use more memory.

To do a simple change to your code in your question, rewrite unfiltered as this:

unfiltered = [ (myFunction(C),C) for C in originalList ]
             ^                                         ^
             +---------- change these to (..) ---------+
                                 |
                                 v
unfiltered = ( (myFunction(C),C) for C in originalList )
like image 34
Lasse V. Karlsen Avatar answered Oct 21 '22 13:10

Lasse V. Karlsen