Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`in range` construction in python 2 --- working too slow

I wanted to check if a given x lies in the interval [0,a-1]. As a lazy coder I wrote

x in range(a)

and (because that piece of code was in 4.5 nested loops) quickly run into performance issues. I tested it and indeed, it turns out runtime of n in range(n) lies in O(n), give or take. I actually thought my code would be optimized to x >= 0 and x < a but it seems this is not the case. Even if I fix the range(a) in advance the time doesn't become constant (though it improves a lot) - see side notes.

So, my question is:

Should I use x >= 0 and x < a and never write x in range(a) ever again? Is there an even better way of writing it?


Side notes:

  1. I tried searching SO for range, python-2.7, performance tags put together, found nothing (same with python-2.x).
  2. If I try following:

    i = range(a)
    ...
    x in i
    

    so that the range is fixed and I only measure runtime of x in i, I still get runtime in O(x) (assuming a is large enough).

  3. runtime of n in xrange(n) lies in O(n) as well.
  4. I found this post, which asks similar question for python 3. I decided to test same stuff on python 3 and it rolled through tests like it's nothing. I got sad for python 2.
like image 372
mike239x Avatar asked Feb 27 '18 04:02

mike239x


People also ask

What is faster than range in Python?

In this case, the for loop is faster, but also more elegant compared to while. Please, have in mind that you can't apply list comprehensions in all cases when you need loops. Some more complex situations require the ordinary for or even while loops.

Is range a lazy Python?

So what is range? The range object is “lazy” in a sense because it doesn't generate every number that it “contains” when we create it. Instead it gives those numbers to us as we need them when looping over it. If you're looking for a description for range objects, you could call them “lazy sequences”.

How do you speed up a loop in Python?

A faster way to loop using built-in functions A faster way to loop in Python is using built-in functions. In our example, we could replace the for loop with the sum function. This function will sum the values inside the range of numbers. The code above takes 0.84 seconds.

Are loops slow in Python?

Looping over Python arrays, lists, or dictionaries, can be slow. Thus, vectorized operations in Numpy are mapped to highly optimized C code, making them much faster than their standard Python counterparts.


1 Answers

The problem with range in Python 2 is that it creates a list of values, so x in range(a) will create a list and linearly scan that list. xrange should be a generator, but it is not much faster; probably still just linearly scans the values, just without creating the entire list first.

In [2]: %timeit 5*10**5 in range(10**6 + 1)  # Python 2
10 loops, best of 3: 18.1 ms per loop

In [3]: %timeit 5*10**5 in xrange(10**6 + 1) # Python 2
100 loops, best of 3: 6.21 ms per loop

In Python 3, range is much smarter, not only not creating the entire list, but also providing a fast implementation of the contains check.

In [1]: %timeit 5*10**5 in range(10**6 + 1)  # Python 3
1000000 loops, best of 3: 324 ns per loop

Even faster and IMHO more readable: Using comparison chaining:

In [2]: %timeit 0 <= 5*10**5 < 10**6 + 1     # Python 2 or 3
10000000 loops, best of 3: 46.6 ns per loop

Should I use x >= 0 and x < a and never write x in range(a) ever again? Is there an even better way of writing it?

"No", "it depends", and "yes". You should not use x >= 0 and x < a because 0 <= x < a is shorter and easier to parse (for puny humans), and is interpreted as (0 <= x) and (x < a). And you should not use in range in Python 2, but in Python 3, you can use it if you like.

Still, I'd prefer comparison chaining, since a <= x < b is much more explicit about bounds than x in range(a, b) (what if x == b?), which could prevent many off-by-one errors or +1 padding the range.

Also, note that 0 <= x < a is not strictly the same as x in range(0, a), as a range will only ever contain integer values, i.e. 1.5 in range(0, 5) is False, whereas 0 <= 1.5 < 5 is True, which may of may not what you want. Also, using range you can use steps other than 1, e.g. 5 in range(4, 10, 2) is False, but the same can also be implemented using pure math, e.g. as (4 <= x < 10) and (x - 4 % 2 == 0).

like image 115
tobias_k Avatar answered Sep 23 '22 22:09

tobias_k