Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimization in Python - do's, don'ts and rules of thumb

Well I was reading this post and then I came across a code which was:

jokes=range(1000000)
domain=[(0,(len(jokes)*2)-i-1) for i in range(0,len(jokes)*2)]

I thought wouldn't it be better to calculate the value of len(jokes) once outside the list comprehension?

Well I tried it and timed three codes

jv@Pioneer:~$ python -m timeit -s 'jokes=range(1000000);domain=[(0,(len(jokes)*2)-i-1) for i in range(0,len(jokes)*2)]'
10000000 loops, best of 3: 0.0352 usec per loop
jv@Pioneer:~$ python -m timeit -s 'jokes=range(1000000);l=len(jokes);domain=[(0,(l*2)-i-1) for i in range(0,l*2)]'
10000000 loops, best of 3: 0.0343 usec per loop
jv@Pioneer:~$ python -m timeit -s 'jokes=range(1000000);l=len(jokes)*2;domain=[(0,l-i-1) for i in range(0,l)]'
10000000 loops, best of 3: 0.0333 usec per loop

Observing the marginal difference 2.55% between the first and the second made me think - is the first list comprehension

domain=[(0,(len(jokes)*2)-i-1) for i in range(0,len(jokes)*2)]

optimized internally by python? or is 2.55% a big enough optimization (given that the len(jokes)=1000000)?

If this is - What are the other implicit/internal optimizations in Python ?

What are the developer's rules of thumb for optimization in Python?

Edit1: Since most of the answers are "don't optimize, do it later if its slow" and I got some tips and links from Triptych and Ali A for the do's. I will change the question a bit and request for don'ts.

Can we have some experiences from people who faced the 'slowness', what was the problem and how it was corrected?

Edit2: For those who haven't here is an interesting read

Edit3: Incorrect usage of timeit in question please see dF's answer for correct usage and hence timings for the three codes.

like image 897
JV. Avatar asked Dec 31 '08 18:12

JV.


People also ask

How do you optimize a while loop in Python?

We can optimize loops by vectorizing operations. This is one/two orders of magnitude faster than their pure Python equivalents (especially in numerical computations). Vectorization is something we can get with NumPy. Numpy is a library with efficient data structures designed to hold matrix data.


1 Answers

You're not using timeit correctly: the argument to -s (setup) is a statement to be executed once initially, so you're really just testing an empty statement. You want to do

$ python -m timeit -s "jokes=range(1000000)" "domain=[(0,(len(jokes)*2)-i-1) for i in range(0, len(jokes)*2)]"
10 loops, best of 3: 1.08 sec per loop
$ python -m timeit -s "jokes=range(1000000)" "l=len(jokes);domain=[(0,(l*2)-i-1) for i in range(0, l*2)]"
10 loops, best of 3: 908 msec per loop
$ python -m timeit -s "jokes=range(1000000)" "l=len(jokes*2);domain=[(0,l-i-1) for i in range(0, l)]"
10 loops, best of 3: 813 msec per loop

While the speedup is still not dramatic, it's more significant (16% and 25% respectively). So since it doesn't make the code any more complicated, this simple optimization is probably worth it.

To address the actual question... the usual rule of thumb in Python is to

  1. Favor straightforward and readable code over optimization when coding.

  2. Profile your code (profile / cProfile and pstats are your friends) to figure out what you need to optimize (usually things like tight loops).

  3. As a last resort, re-implement these as C extensions, which is made much easier with tools like pyrex and cython.

One thing to watch out for: compared to many other languages, function calls are relatively expensive in Python which is why the optimization in your example made a difference even though len is O(1) for lists.

like image 166
dF. Avatar answered Oct 07 '22 06:10

dF.