Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all values within one list from another list? [duplicate]

Tags:

python

list

People also ask

How do you remove items from a list from one list to another in Python?

How to Remove an Element from a List Using the remove() Method in Python. To remove an element from a list using the remove() method, specify the value of that element and pass it as an argument to the method. remove() will search the list to find it and remove it.

How do you remove an element from one list to another list?

The remove() method removes the first matching element (which is passed as an argument) from the list. The pop() method removes an element at a given index, and will also return the removed item. You can also use the del keyword in Python to remove an element or slice from a list.


>>> a = range(1, 10)
>>> [x for x in a if x not in [2, 3, 7]]
[1, 4, 5, 6, 8, 9]

I was looking for fast way to do the subject, so I made some experiments with suggested ways. And I was surprised by results, so I want to share it with you.

Experiments were done using pythonbenchmark tool and with

a = range(1,50000) # Source list
b = range(1,15000) # Items to remove

Results:

 def comprehension(a, b):
     return [x for x in a if x not in b]

5 tries, average time 12.8 sec

def filter_function(a, b):
    return filter(lambda x: x not in b, a)

5 tries, average time 12.6 sec

def modification(a,b):
    for x in b:
        try:
            a.remove(x)
        except ValueError:
            pass
    return a

5 tries, average time 0.27 sec

def set_approach(a,b):
    return list(set(a)-set(b))

5 tries, average time 0.0057 sec

Also I made another measurement with bigger inputs size for the last two functions

a = range(1,500000)
b = range(1,100000)

And the results:

For modification (remove method) - average time is 252 seconds For set approach - average time is 0.75 seconds

So you can see that approach with sets is significantly faster than others. Yes, it doesn't keep similar items, but if you don't need it - it's for you. And there is almost no difference between list comprehension and using filter function. Using 'remove' is ~50 times faster, but it modifies source list. And the best choice is using sets - it's more than 1000 times faster than list comprehension!


If you don't have repeated values, you could use set difference.

x = set(range(10))
y = x - set([2, 3, 7])
# y = set([0, 1, 4, 5, 6, 8, 9])

and then convert back to list, if needed.


a = range(1,10)
itemsToRemove = set([2, 3, 7])
b = filter(lambda x: x not in itemsToRemove, a)

or

b = [x for x in a if x not in itemsToRemove]

Don't create the set inside the lambda or inside the comprehension. If you do, it'll be recreated on every iteration, defeating the point of using a set at all.


The simplest way is

>>> a = range(1, 10)
>>> for x in [2, 3, 7]:
...  a.remove(x)
... 
>>> a
[1, 4, 5, 6, 8, 9]

One possible problem here is that each time you call remove(), all the items are shuffled down the list to fill the hole. So if a grows very large this will end up being quite slow.

This way builds a brand new list. The advantage is that we avoid all the shuffling of the first approach

>>> removeset = set([2, 3, 7])
>>> a = [x for x in a if x not in removeset]

If you want to modify a in place, just one small change is required

>>> removeset = set([2, 3, 7])
>>> a[:] = [x for x in a if x not in removeset]

Others have suggested ways to make newlist after filtering e.g.

newl = [x for x in l if x not in [2,3,7]]

or

newl = filter(lambda x: x not in [2,3,7], l) 

but from your question it looks you want in-place modification for that you can do this, this will also be much much faster if original list is long and items to be removed less

l = range(1,10)
for o in set([2,3,7,11]):
    try:
        l.remove(o)
    except ValueError:
        pass

print l

output: [1, 4, 5, 6, 8, 9]

I am checking for ValueError exception so it works even if items are not in orginal list.

Also if you do not need in-place modification solution by S.Mark is simpler.