Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering Lists of Tuples by Elements of Tuples

I am working in Python (2.7.9) and am trying to filter a list of tuples by a list of elements of those tuples. In particular, my objects have the following form:

tuples = [('a', ['a1', 'a2']), ('b',['b1', 'b2']), ('c',['c1', 'c2'])]
filter = ['a', 'c']

I am new to Python and the easiest way to filter the tuples that I could discover was with the following list comprehension:

tuples_filtered = [(x,y) for (x,y) in tuples if x in filter]

The resulting filtered list looks like:

tuples_filtered = [('a', ['a1', 'a2']), ('c',['c1', 'c2'])]

Unfortunately, this list comprehension seems to be very inefficient. I suspect this is because my list of tuples is much larger than my filter, the list of strings. In particular, the filter list contains 30,000 words and the list of tuples contains about 134,000 2-tuples.

The first elements of the 2-tuples are largely distinct, but there are a few instances of duplicate first elements (not sure how many, actually, but by comparison to the cardinality of the list it's not many).

My question: Is there a more efficient way to filter a list of tuples by a list of elements of those tuples?

(Apologies if this is off-topic or a dupe.)

Related question (which does not mention efficiency):

Filter a list of lists of tuples

like image 314
DyingIsFun Avatar asked Nov 16 '16 18:11

DyingIsFun


People also ask

How do you filter a list of tuples?

To filter a list of tuples in Python: Use the filter() function to filter the list. The filter function returns an iterator containing the results. Pass the filter object to the list() class to convert it to a list.

How do you find the elements in a list of tuples?

In the majority of programming languages when you need to access a nested data type (such as arrays, lists, or tuples), you append the brackets to get to the innermost item. The first bracket gives you the location of the tuple in your list. The second bracket gives you the location of the item in the tuple.

How do you filter a list element in Python?

Python has a built-in function called filter() that allows you to filter a list (or a tuple) in a more beautiful way. The filter() function iterates over the elements of the list and applies the fn() function to each element. It returns an iterator for the elements where the fn() returns True .

How do you sort the list of tuples by the second element?

Use the key argument of the sorted() function to sort a list of tuples by the second element, e.g. sorted_list = sorted(list_of_tuples, key=lambda t: t[1]) . The function will return a new list, sorted by the second tuple element.


1 Answers

In a comment you write:

The filter list contains 30,000 words and the list of tuples contains about 134,000 2-tuples.

in containment tests against a list takes O(N) linear time, which is slow when you do this 134k times. Each time you have to iterate over all those elements to find a match. Given that you are filtering, not all those first elements are going to be present in the 30k list, so you are executing up to 30k * 134k == 4 billion comparisons.

Use a set instead:

filter_set = set(filter)

Set containment tests are O(1) constant time; now you reduced your problem to 134k tests.

A much smaller component of time you can avoid spending is the tuple assignment; use indexing to extract just the one element you are testing with:

tuples_filtered = [tup for tup in tuples if tup[0] in filter_set]
like image 59
Martijn Pieters Avatar answered Sep 18 '22 20:09

Martijn Pieters