Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to check if the set contains numbers in a given range in Python

Tags:

python

range

set

What is the fastest way to check if a set contains at least one number within a given range?

For example setA = set(1,4,7,9,10), lowerRange=6, upperRange=8, will return True because of 7.

Currently I am using:

filtered = filter(lambda x: lowerRange<=x<=upperRange,setA)

Then if filtered is not empty, returns a True.

Assuming that setA can be a very large set, is this the optimal solution? Or is this iterating through the entire setA?

like image 980
user1179317 Avatar asked Jan 19 '26 02:01

user1179317


1 Answers

Since the membership chick is approximately O(1) for sets you can use a generator expression within any() built-in function:

rng = range(6, 9)
any(i in setA for i in rng)

Note that for a short range you'll give a better performance with set.intersection():

In [2]: a = {1,4,7,9,10}

In [3]: rng = range(6, 9)

In [8]: %timeit bool(a.intersection(rng))
1000000 loops, best of 3: 344 ns per loop

In [9]: %timeit any(i in a for i in rng)
1000000 loops, best of 3: 620 ns per loop

But in this case for longer ranges you'd definitely go with any():

In [10]: rng = range(6, 9000)

In [11]: %timeit any(i in a for i in rng)
1000000 loops, best of 3: 620 ns per loop

In [12]: %timeit bool(a.intersection(rng))
1000 loops, best of 3: 233 µs per loop

And note that the reason that any() performs better is because it returns True right after it encounter an item that exist in your set (based on our membership condition) an since the number 8 is at the beginning of the range it makes the any() per forms so fast. Also as mentioned in comment as a more pythonic way for checking the validity of the intersection of an iterable within a set you can use isdisjoint() method. Here is a benchmark with this method for small rage:

In [26]: %timeit not a.isdisjoint(rng)
1000000 loops, best of 3: 153 ns per loop

In [27]: %timeit any(i in a for i in rng)
1000000 loops, best of 3: 609 ns per loop

And here is a benchmark that makes the any() checks the membership for all the numbers which shows that isdisjoint() performs so better:

In [29]: rng = range(8, 1000)

In [30]: %timeit any(i in a for i in rng)
1000000 loops, best of 3: 595 ns per loop

In [31]: %timeit not a.isdisjoint(rng)
10000000 loops, best of 3: 142 ns per loop
like image 154
Mazdak Avatar answered Jan 20 '26 17:01

Mazdak