Filtering Lists of Tuples by Elements of Tuples

Tags:

I am working in Python (2.7.9) and am trying to filter a list of tuples by a list of elements of those tuples. In particular, my objects have the following form:

tuples = [('a', ['a1', 'a2']), ('b',['b1', 'b2']), ('c',['c1', 'c2'])]
filter = ['a', 'c']

I am new to Python and the easiest way to filter the tuples that I could discover was with the following list comprehension:

tuples_filtered = [(x,y) for (x,y) in tuples if x in filter]

The resulting filtered list looks like:

tuples_filtered = [('a', ['a1', 'a2']), ('c',['c1', 'c2'])]

Unfortunately, this list comprehension seems to be very inefficient. I suspect this is because my list of tuples is much larger than my filter, the list of strings. In particular, the filter list contains 30,000 words and the list of tuples contains about 134,000 2-tuples.

The first elements of the 2-tuples are largely distinct, but there are a few instances of duplicate first elements (not sure how many, actually, but by comparison to the cardinality of the list it's not many).

My question: Is there a more efficient way to filter a list of tuples by a list of elements of those tuples?

(Apologies if this is off-topic or a dupe.)

Related question (which does not mention efficiency):

Filter a list of lists of tuples

314

asked Nov 16 '16 18:11

DyingIsFun

1 Answers

In a comment you write:

The filter list contains 30,000 words and the list of tuples contains about 134,000 2-tuples.

in containment tests against a list takes O(N) linear time, which is slow when you do this 134k times. Each time you have to iterate over all those elements to find a match. Given that you are filtering, not all those first elements are going to be present in the 30k list, so you are executing up to 30k * 134k == 4 billion comparisons.

Use a set instead:

filter_set = set(filter)

Set containment tests are O(1) constant time; now you reduced your problem to 134k tests.

A much smaller component of time you can avoid spending is the tuple assignment; use indexing to extract just the one element you are testing with:

tuples_filtered = [tup for tup in tuples if tup[0] in filter_set]

answered Sep 18 '22 20:09

Martijn Pieters

Related questions
                            
                                Get data from <script> tag in HTML using Scrapy
                            
                                Simple Python server to process GET and POST requests with JSON
                            
                                Does Indexing makes Slice of pandas dataframe faster?
                            
                                flask Template Inheritance tutorial
                            
                                Flask : sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) relation "users" does not exist
                            
                                Unknown column '' in 'field list'. Django
                            
                                Force SymPy to keep the order of terms
                            
                                Computing separate tfidf scores for two different columns using sklearn
                            
                                Python: threads can only be started once
                            
                                What is default header content-type for http post method in curl form? [duplicate]
                            
                                Extracting nationalities and countries from text
                            
                                Access last elements of inner multiindex level in pandas dataframe
                            
                                Trouble using numpy.load
                            
                                Creating a dictionary type column in dataframe
                            
                                Python: No Module named Zlib, Mac OS X El Capitan 10.11.6
                            
                                Why is numpy.prod() incorrectly returning negative results, or 0, for my long lists of natural numbers?
                            
                                interactive conditional histogram bucket slicing data visualization
                            
                                Cannot cast array data from dtype('O') to dtype('float64')
                            
                                You have 3 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth
                            
                                What does python return on the leap second

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Filtering Lists of Tuples by Elements of Tuples

Tags:

performance

python

python-2.7

DyingIsFun

People also ask

1 Answers

Martijn Pieters

Recent Activity

Donate For Us