Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is for _ in range(n) slower than for _ in [""]*n?

Testing alternatives to for _ in range(n) (to execute some action n times, even if the action does not depend on the value of n) I noticed that there is another formulation of this pattern that is faster, for _ in [""] * n.

For example:

timeit('for _ in range(10^1000): pass', number=1000000)

returns 16.4 seconds;

whereas,

timeit('for _ in [""]*(10^1000): pass', number=1000000)

takes 10.7 seconds.

Why is [""] * 10^1000 so much faster than range(10^1000) in Python 3?

All testing done using Python 3.3

like image 819
Dunedubby Avatar asked May 22 '15 14:05

Dunedubby


People also ask

What does for _ in range N mean?

So, for i in range(n) means that you're going to do something n times. For example: for i in range(10): print(n) means you're going to print the numbers 0 to 9, because the range function goes from 0 to n-1 if you provide a single argument, and from a to b if you provide two, as in range(a, b)

Why the for and range function is more efficient in Python?

Because you are running more often in code written in C in the interpretor. i.e. i+=1 is in Python, so slow (comparatively), whereas range(0,...) is one C call the for loop will execute mostly in C too.

What does for in range mean in Python?

To simply put it, python's “for i in range” is a way to output values within the range() function. By doing so, it allows you to easily produce the values within a given range. But to fully understand how this line of code works, we have to break down the code to its core principles.


1 Answers

When iterating over range(), objects for all integers between 0 and n are produced; this takes a (small) amount of time, even with small integers having been cached.

The loop over [None] * n on the other hand produces n references to 1 object, and creating that list is a little faster.

However, the range() object uses far less memory, and is more readable to boot, which is why people prefer using that. Most code doesn't have to squeeze every last drop from the performance.

If you need to have that speed, you can use a custom iterable that takes no memory, using itertools.repeat() with a second argument:

from itertools import repeat

for _ in repeat(None, n):

As for your timing tests, there are some problems with those.

First of all, you made an error in your ['']*n timing loop; you did not embed two quotes, you concatenated two strings and produced an empty list:

>>> '['']*n'
'[]*n'
>>> []*100
[]

That's going to be unbeatable in an iteration, as you iterated 0 times.

You also didn't use large numbers; ^ is the binary XOR operator, not the power operator:

>>> 10^1000
994

which means your test missed out on how long it'll take to create a large list of empty values.

Using better numbers and None gives you:

>>> from timeit import timeit
>>> 10 ** 6
1000000
>>> timeit("for _ in range(10 ** 6): pass", number=100)
3.0651066239806823
>>> timeit("for _ in [None] * (10 ** 6): pass", number=100)
1.9346517859958112
>>> timeit("for _ in repeat(None, 10 ** 6): pass", 'from itertools import repeat', number=100)
1.4315521717071533
like image 53
Martijn Pieters Avatar answered Oct 26 '22 23:10

Martijn Pieters