Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to delete a list within a list (i.e., a sublist) if any element of that sublist is in another list?

I have a list that contains a number of sublists. For example:

full_list = [[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]]

I also have another list, called omit. For example:

omit = [99, 60, 98]

I want to remove the sublists inside of full_list, if any element in that sublist is in the omit list. For example, I would want the resulting list to be:

reduced_list = [[1, 1, 3, 4], [2, 4, 4]]

because only these sublists do not have an element that is in the omit list.

I am guessing that there is some easy way to pull this off with a list comprehension but I cannot get it to work. I have tried a bunch of things: For example:

reduced_list = [sublist for sublist in full_list if item for sublist not in omit] 
  • this code results in an error (invalid snytax) - but I think I'm missing more than that.

Any help would be much appreciated!

p.s., The above is a simplified problem. My end goal is to remove sublists from a very long list (e.g., 500,000 sublists) of strings if any element (a string) of those sublists is in an "omit" list contain over 2000 strings.

like image 263
ansonw Avatar asked Feb 15 '23 08:02

ansonw


1 Answers

Use set and all():

>>> omit = {99, 60, 98}
>>> full_list = [[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]]
>>> [item for item in full_list if all(x not in omit for x in item)]
[[1, 1, 3, 4], [2, 4, 4]]

Main difference between this method and @alecxe's(or @Óscar López's) solution is that it all short-circuits and doesn't create any set or list in the memory while set-intersection returns a new set that contains all items that are common with omit set and it's length is checked to determine whether any item was common or not.(set-intersection happens internally at C speed so it is faster than normal python loops used in all)

Timing comparison:

>>> import random

No items intersect:

>>> omit = set(random.randrange(1, 10**18) for _ in xrange(100000))
>>> full_list = [[random.randrange(10**19, 10**100) for _ in xrange(100)] for _ in xrange(1000)]

>>> %timeit [item for item in full_list if not omit & set(item)]
10 loops, best of 3: 43.3 ms per loop
>>> %timeit [x for x in full_list if not omit.intersection(x)]
10 loops, best of 3: 28 ms per loop
>>> %timeit [item for item in full_list if all(x not in omit for x in item)]
10 loops, best of 3: 65.3 ms per loop

All items intersect:

>>> full_list = [range(10**3) for _ in xrange(1000)]
>>> omit = set(xrange(10**3))
>>> %timeit [item for item in full_list if not omit & set(item)]
1 loops, best of 3: 148 ms per loop
>>> %timeit [x for x in full_list if not omit.intersection(x)]
1 loops, best of 3: 108 ms per loop
>>> %timeit [item for item in full_list if all(x not in omit for x in item)]
100 loops, best of 3: 1.62 ms per loop

Some items intersect:

>>> omit = set(xrange(1000, 10000))
>>> full_list = [range(2000) for _ in xrange(1000)]
>>> %timeit [item for item in full_list if not omit & set(item)]
1 loops, best of 3: 282 ms per loop
>>> %timeit [x for x in full_list if not omit.intersection(x)]
1 loops, best of 3: 159 ms per loop
>>> %timeit [item for item in full_list if all(x not in omit for x in item)]
1 loops, best of 3: 227 ms per loop
like image 148
Ashwini Chaudhary Avatar answered Apr 27 '23 08:04

Ashwini Chaudhary