Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List Manipulation in Python with pop()

Tags:

python

list

In short, I need to remove multiple items from a list according to their indexes. However, I can't use pop because it shifts the indexes (without some clumsy compensating system). Is there a way to remove multiple items simultaneously?

I have an algorithm that goes through a list, and if the conditions are right removes that item via the pop method. A problem arises seeing as this is all done in a loop. Once pop is done the list is shortened by one, displacing all the values by one. So the loop will go out of range. Is it possible to remove multiple items simultaneously, or another solution?

An example of my problem:

L = ['a', 'b', 'c', 'd']

for i in range(len(L)):
    print L
    if L[i] == 'a' or L[i] == 'c':
        L.pop(i)
like image 608
rectangletangle Avatar asked Mar 02 '11 03:03

rectangletangle


People also ask

Can you use pop on a list Python?

List pop in Python is a pre-defined, in-built function that removes an item at the specified index from the list. You can also use pop in Python without mentioning the index value. In such cases, the pop() function will remove the last element of the list.

What is the use of pop () in a list?

The pop() method removes the item at the given index from the list and returns the removed item.

What does Pop () in Python?

Python Dictionary pop() Method The pop() method removes the specified item from the dictionary. The value of the removed item is the return value of the pop() method, see example below.


3 Answers

Are your lists large? If so, use ifilter from itertools to filter out elements that you don't want lazily (with no up front cost).

Lists not so large? Just use a list comprehension:

 newlist = [x for x in oldlist if x not in ['a', 'c'] ]

This will create a new copy of the list. This is not generally an issue for efficiency unless you really care about memory consumption.

As a happy medium of syntax convenience and laziness ( = efficiency for large lists), you can construct a generator rather than a list by using ( ) instead of [ ]:

interestingelts = (x for x in oldlist if x not in ['a', 'c'])

After this, you can iterate over interestingelts, but you can't index into it:

 for y in interestingelts:    # ok
    print y

 print interestingelts[0]     # not ok: generator allows sequential access only
like image 100
phooji Avatar answered Nov 08 '22 14:11

phooji


You want a list comprehension:

L = [c for c in L if c not in ['a', 'c']]

Or, if you really don't want to create a copy, go backwards:

for i in reversed(range(len(L))):
    if L[i] in ['a', 'c']:
        L.pop(i)    # del L[i] is more efficient

Thanks to ncoghlan for reversed() & phooji for del L[i] suggestions. (I decided to leave it as L.pop(i), since that's how the question was initially formulated.)

Also, as J.S. Sebastian correctly points out, going backwards is space efficient but time inefficient; most of the time a list comprehension or generator (L = (...) instead of L = [...]) is best.

Edit:

Ok, so since people seem to want something less ridiculously slow than the reversed method above (I can't imagine why... :) here's an order-preserving, in-place filter that should differ in speed from a list comprehension only by a constant. (This is akin to what I'd do if I wanted to filter a string in c.)

write_i = 0
for read_i in range(len(L)):
    L[write_i] = L[read_i]
    if L[read_i] not in ['a', 'c']:
         write_i += 1

del L[write_i:]
print L
# output: ['b', 'd']
like image 44
senderle Avatar answered Nov 08 '22 16:11

senderle


Summary

  • use list comprehension (or genexpr) to remove multiple items from a list
  • if your input is a large byte-string then use str.translate() to delete characters
  • deleting one item at a time del L[i] is slow for large lists

If items are bytes as in your example you could use str.translate():

def remove_bytes(bytestr, delbytes):
    """
    >>> remove_bytes(b'abcd', b'ac') == b'bd'
    True
    """
    return bytestr.translate(None, delbytes)

In general multiple items could be removed using slicing:

def remove_inplace_without_order(L, delitems):
    """Remove all items from `L` that are in `delitems` (not preserving order).

    >>> L = list(range(4)); remove_inplace_without_order(L, [0,2]); L
    [3, 1]
    """
    idel = len(L) # items idel.. to be removed
    for i in reversed(range(len(L))):
        if L[i] in delitems:
            idel -= 1
            L[i] = L[idel] # save `idel`-th item
    del L[idel:] # remove items all at once
    #NOTE: the function returns `None` (it means it modifies `L` inplace)

As @phooji and @senderle already mentioned list comprehension (or generator expression) is preferable in your case:

def remove_listcomp(L, delitems):
    return [x for x in L if x not in delitems]

Here's a performance comparison for L=list("abcd"*10**5); delitems="ac":

| function                     | time, msec |  ratio |
|------------------------------+------------+--------|
| list                         |       4.42 |    0.9 |
| remove_bytes                 |       4.88 |    1.0 |
| remove                       |       27.3 |    5.6 |
| remove_listcomp              |       36.8 |    7.5 |
| remove_inplace_without_order |       71.2 |   14.6 |
| remove_inplace_senderle2     |       83.8 |   17.2 |
| remove_inplace_senderle      |      15000 | 3073.8 |
#+TBLFM: $3=$2/@3$2;%.1f

Where

try:
    from itertools import ifilterfalse as filterfalse
except ImportError:
    from itertools import filterfalse # py3k

def remove(L, delitems):
    return filterfalse(delitems.__contains__, L)

def remove_inplace_senderle(L, delitems):
    for i in reversed(range(len(L))):
        if L[i] in delitems:
            del L[i]

def remove_inplace_senderle2(L, delitems):
    write_i = 0
    for read_i in range(len(L)):
        L[write_i] = L[read_i]
        if L[read_i] not in delitems:
             write_i += 1
    del L[write_i:]

remove_inplace_senderle() is slow due to it uses O(N**2) algorithm. Each del L[i] might cause all items to the right to be moved left to close the gap.

The time column in the above table includes time it takes to create a new input list (the first row) due to some algorithms modify an input inplace.

Here's timings for the same input but without creating a new list on each iteration:

 | function        | time, msec | ratio |
 |-----------------+------------+-------|
 | remove_bytes    |      0.391 |     1 |
 | remove          |       24.3 |    62 |
 | remove_listcomp |       33.4 |    85 |
 #+TBLFM: $3=$2/@2$2;%d

The table shows that itertools.ifilterfalse() doesn't provide a significant improvement over listcomp.

In general it is not worth it or even harmful to think about performance for such tasks unless a profiler proven that this code is a bottleneck and it is important for your program. But it might be useful to be aware of alternative approaches that could provide more than an order of magnitude improvement in speed.

like image 25
jfs Avatar answered Nov 08 '22 15:11

jfs