Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python list comprehension- "pop" result from original list?

Say I have a list in Python 3.X. I use list comprehension to return a subset of that list--- is there an easy/Pythonic way to "pop" that subset off of the original list (so the elements in that subset are no longer in the original list after being returned)? Thanks!

Example (I need help in defining my_func):

a = [1, 2, 2, 3, 4, 5]
a, b = my_func(a, *kwargs)

I'd then want:

a = [1, 2, 2]
b = [3, 4, 5]

Note: I do not want to pop the values off one at a time, but rather the whole subset at once. "Pop" may not be the best terminology, but I don't know what is.

The best way I can think of doing it is:

 temp = [idx for idx, val in enumerate(a) if val > 2]  
 b = [a[i] for i in temp]  
 a = [val for idx,val in enumerate(a) if idx not in temp]  

Obviously, I would prefer something more elegant and efficient. I want to avoid going over the list twice.

EDIT: I think this question is unique because I am not simply concerned with splitting the list - I want to modify the original list. Splitting and assigning one of those splits to the original list variable is a possible solution, but I was hoping there might be a way to do it without explicitly assigning to the original list variable (similar to something like b.append(a.pop()))

like image 367
TaylerJones Avatar asked Jan 12 '15 17:01

TaylerJones


2 Answers

Just filter out the unwanted items using list comprehension:

a = [1, 2, 2, 3, 4, 5]
condition = lambda x: x > 2
b = [x for x in a if condition(x)]      # b == [3, 4, 5] 
a = [x for x in a if not condition(x)]  # a == [1, 2, 2]

Update

If you are worrying about efficiency then here is another approach which scans the list only once:

def classify(iterable, condition):
    """ Returns a list that matches the condition and a list that
    does not """
    true_list = []
    false_list = []
    for item in iterable:
        if condition(item):
            true_list.append(item)
        else:
            false_list.append(item)
    return true_list, false_list

a = [1, 2, 2, 3, 4, 5]
b, a = classify(a, lambda x: x > 2)
print(a)  # [1, 2, 2]
print(b)  # [3, 4, 5]

Update 2

I did not realize that my update looks almost the same to that of user3467349, but believe it or not, I did not cheat :-)

like image 158
Hai Vu Avatar answered Oct 24 '22 23:10

Hai Vu


Keep in mind this is Python, not C, and sometimes the way it works is not necessarily intuitive. This means you have to verify your assumptions. I've done this using IPython's built-in %%timeit magic.

a = [1,2,2,3,4,5]
%timeit b=[x for x in a if x > 2]\
1000000 loops, best of 3: 362 ns per loop
%timeit c=[x for x in a if not (x > 2)]
1000000 loops, best of 3: 371 ns per loop

So just under 800 ns for both, but we're iterating over the loop twice. Surely we don't need to do that? How about a couple of the above methods? Let's start with classify, which is pretty straightforward and only walks over the list once:

%timeit b, a = classify(a, lambda x: x > 2)
1000000 loops, best of 3: 1.89 µs per loop

Wow, even though it only walks over the loop once, it takes more than twice as long as the simple solution above, which walks it twice. Let's try a slight variation on the other solution proposed above:

%%timeit
b, c = [], []
for x in a:
    b.append(x) if x > 2 else a.append(x)
1000000 loops, best of 3: 1.2 µs per loop

Better, but it's still slower than our 'naive'/'inefficient' implementation. Maybe a little different formulation would be better:

%%timeit
b, c = [], []
for x in a:
    if x > 2:
        b.append(x)
    else:
        c.append(x)
1000000 loops, best of 3: 983 ns per loop

Hmm, that seems almost the same, but is a little faster. Still doesn't beat the naive implementation. Let's get really clever, maybe make it even faster:

%%timeit
b, c = [], []
for x in a:
    (b, c)[x > 2].append(x)
1000000 loops, best of 3: 1.28 µs per loop

So, what we saw is that despite our desire not to iterate the loop twice, we don't seem to be able to improve on just doing two list comprehensions. List comprehensions are sort of 'under the hood' in Python, and so for many things they are going to faster than anything you come up with.

Now, the x < 2 comparison is cheap - no function calls, simple math. There will be a point where it starts to make sense. Let's create a more expensive comparison function - we'll calculate the factorial (and do it inefficiently):

def factorial(x):
    if x in (0,1):
        return 1
    else:
        return x * factorial(x-1)

%timeit b = [x for x in a if factorial(x) > 6]
100000 loops, best of 3: 3.47 µs per loop
%timeit c = [x for x in a if not factorial(x) > 6]
100000 loops, best of 3: 3.53 µs per loop

So, obviously the times went up a good bit - now are at about 7uS for everything.

Let's try some of our other examples now:

%timeit b, c = classify(a, lambda x: factorial(x) > 6)
100000 loops, best of 3: 5.05 µs per loop

%%timeit
b, c = [], []
for x in a:
    if factorial(x) > 6:
        b.append(x)
    else:
        c.append(x)
100000 loops, best of 3: 4.01 µs per loop

%%timeit
b, c = [], []
for x in a:
    (b, c)[factorial(x) > 6].append(x)
100000 loops, best of 3: 4.59 µs per loop

The lesson from all of this: When dealing with python & efficiency, it's usually a good idea to just try it out in a console, but very often the naive implementations are reasonably performant & the easiest to read. And a quick test will tell you if trying to optimize is actually worth it; you can often make it less readable AND slower if you're not careful....

Addendum: Someone commented that we need longer lists, because we're measuring function call overhead more than performance. They have a good point, but the times show about the same relationship on my machine:

In [16]: a = range(100000)

In [17]: random.shuffle(a)

In [18]: %timeit b = [x for x in a if x > 50000]
100 loops, best of 3: 5.2 ms per loop

In [19]: %timeit c = [x for x in m if not x > 50000]
100 loops, best of 3: 5.18 ms per loop

In [20]: %timeit c = [x for x in m if x <= 50000]
100 loops, best of 3: 5.35 ms per loop

In [21]: %%timeit
   ....: b, c = [], []
   ....: for x in a:
   ....:     if x > 50000:
   ....:         b.append(x)
   ....:     else:
   ....:         c.append(x)
   ....:
100 loops, best of 3: 12.7 ms per loop

Note that if I change the comparison to x > 2 (instead of x > 50000) the second loop sped up to about 11.4ms. But still, that's barely any faster than the naive implementation. That said, it's probably the one I'd prefer - it's still easy to read, and it's not slower (or significantly so). And code tends to get more complex over time, so when you later add another comparison, or change it to a function call, it's easier to do, and the performance will suffer less in this version than the naive version.

But again: This isn't to say 'use list comprehensions' (though they are very often a good idea), but verify your assumptions.

like image 33
Corley Brigman Avatar answered Oct 24 '22 22:10

Corley Brigman