Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rationale behind Python's preferred for syntax

What is the rationale behind the advocated use of the for i in xrange(...)-style looping constructs in Python? For simple integer looping, the difference in overheads is substantial. I conducted a simple test using two pieces of code:

File idiomatic.py:

#!/usr/bin/env python

M = 10000
N = 10000

if __name__ == "__main__":
    x, y = 0, 0
    for x in xrange(N):
        for y in xrange(M):
            pass

File cstyle.py:

#!/usr/bin/env python

M = 10000
N = 10000

if __name__ == "__main__":
    x, y = 0, 0
    while x < N:
        while y < M:
            y += 1
        x += 1

Profiling results were as follows:

bash-3.1$ time python cstyle.py

real    0m0.109s
user    0m0.015s
sys     0m0.000s

bash-3.1$ time python idiomatic.py

real    0m4.492s
user    0m0.000s
sys     0m0.031s

I can understand why the Pythonic version is slower -- I imagine it has a lot to do with calling xrange N times, perhaps this could be eliminated if there was a way to rewind a generator. However, with this deal of difference in execution time, why would one prefer to use the Pythonic version?

Edit: I conducted the tests again using the code Mr. Martelli provided, and the results were indeed better now:

I thought I'd enumerate the conclusions from the thread here:

1) Lots of code at the module scope is a bad idea, even if the code is enclosed in an if __name__ == "__main__": block.

2) *Curiously enough, modifying the code that belonged to thebadone to my incorrect version (letting y grow without resetting) produced little difference in performance, even for larger values of M and N.

like image 962
susmits Avatar asked Apr 10 '10 01:04

susmits


4 Answers

Here's the proper comparison, e.g. in loop.py:

M = 10000
N = 10000

def thegoodone():
   for x in xrange(N):
       for y in xrange(M):
           pass

def thebadone():
    x = 0
    while x < N:
        y = 0
        while y < M:
            y += 1
        x += 1

All substantial code should always be in functions -- putting a hundred million loops at a module's top level shows reckless disregard for performance and makes a mockery of any attempts at measuring said performance.

Once you've done that, you see:

$ python -mtimeit -s'import loop' 'loop.thegoodone()'
10 loops, best of 3: 3.45 sec per loop
$ python -mtimeit -s'import loop' 'loop.thebadone()'
10 loops, best of 3: 10.6 sec per loop

So, properly measured, the bad way that you advocate is about 3 times slower than the good way which Python promotes. I hope this makes you reconsider your erroneous advocacy.

like image 111
Alex Martelli Avatar answered Oct 12 '22 15:10

Alex Martelli


You forgot to reset y to 0 after the inner loop.

#!/usr/bin/env python
M = 10000
N = 10000

if __name__ == "__main__":
    x, y = 0, 0
    while x < N:
        while y < M:
            y += 1
        x += 1
        y = 0

ed: 20.63s after fix vs. 6.97s using xrange

like image 42
Glenn Maynard Avatar answered Oct 12 '22 13:10

Glenn Maynard


good for iterating over data structures

The for i in ... syntax is great for iterating over data structures. In a lower-level language, you would generally be iterating over an array indexed by an int, but with the python syntax you can eliminate the indexing step.

like image 24
Mark Harrison Avatar answered Oct 12 '22 15:10

Mark Harrison


this is not a direct answer to the question, but i want to open the dialog a bit more on xrange(). two things:

A. there is something wrong with one of the OP statements that no one has corrected yet (yes, in addition to the bug in the code of not resetting y):

"I imagine it has a lot to do with calling xrange N times...."

unlike traditional counting for loops, Python's is more like a shell's foreach... looping over an iterable. therefore, xrange() is called exactly once, not "N times."

B. xrange() is the name of this function in Python 2. it replaces and is renamed to range() in Python 3, so keep this in mind when porting. if you didn't know already, xrange() returns an iterator(-like object) while range() returns lists. since the latter is more inefficient, it has been deprecated in favor of xrange() which is more memory-friendly. the workaround in Python 3, for all those who need to have a list is list(range(N)).

like image 34
wescpy Avatar answered Oct 12 '22 14:10

wescpy