Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to eliminate items in list of lists in fast way while imposing some restrictions on its items?

My very first post and question here...

So, let list_a be the list of lists:

list_a = [[2,7,8], [3,4,2], [5,10], [4], [2,3,5]...]

Let list_b be another list of integers: list_b = [5,7]

I need to exclude all lists in list_a, whose items include at least one item from list_b. The result from example above schould look like list_c = [[3,4,2], [4]...]

If list_b was not a list but a single number b, then one could define list_c in one line as:

list_c = [x for x in list_a if not b in x]

I am wondering, if it is possible to write an elegant one-liner also for the list list_b with several values in it. Of course, I can just loop through all list_b's values, but may be there exists a faster option?

like image 802
DimaWest Avatar asked Dec 22 '22 15:12

DimaWest


1 Answers

Let's first consider the task of checking an individual element of list_a - such as [2,7,8] - because no matter what, we're conceptually doing to need a way to do that, and then we're going to apply that to the list with a list comprehension. I'll use a as the name for such a list, and b for an element of list_b.

The straightforward way to write this is using the any builtin, which works elegantly in combination with generator expressions: any(b in a for b in list_b).

The logic is simple: we create a generator expression (like a lazily-evaluated list comprehension) to represent the result of the b in a check applied to each b in list_b. We create those by replacing the [] with (); but due to a special syntax rule we may drop these when using it as the sole argument to a function. Then any does exactly what it sounds like: it checks (with early bail-out) whether any of the elements in the iterable (which includes generator expressions) is truthy.


However, we can likely do better by taking advantage of set intersection. The key insight is that the test we are trying to do is symmetric; considering the test between a and list_b (and coming up with another name for elements of a), we could equally have written any(x in list_b for x in a), except that it's harder to understand that.

Now, it doesn't help to make a set from a, because we have to iterate over a anyway in order to do that. (The generator expression does that implicitly; in used for list membership requires iteration.) However, if we make a set from list_b, then we can do that once, ahead of time, and just have any(x in set_b for x in a).

But that, in turn, is a) as described above, hard to understand; and b) overlooking the built-in machinery of sets. The operator & normally used for set intersection requires a set on both sides, but the named method .intersection does not. Thus, set_b.intersection(a) does the trick.


Putting it all together, we get:

set_b = set(list_b)
list_c = [a for a in list_a if not set_b.intersection(a)]
like image 170
Karl Knechtel Avatar answered Dec 28 '22 10:12

Karl Knechtel