Optimization in Python - do's, don'ts and rules of thumb

Tags:

optimization

Well I was reading this post and then I came across a code which was:

jokes=range(1000000)
domain=[(0,(len(jokes)*2)-i-1) for i in range(0,len(jokes)*2)]

I thought wouldn't it be better to calculate the value of len(jokes) once outside the list comprehension?

Well I tried it and timed three codes

jv@Pioneer:~$ python -m timeit -s 'jokes=range(1000000);domain=[(0,(len(jokes)*2)-i-1) for i in range(0,len(jokes)*2)]'
10000000 loops, best of 3: 0.0352 usec per loop
jv@Pioneer:~$ python -m timeit -s 'jokes=range(1000000);l=len(jokes);domain=[(0,(l*2)-i-1) for i in range(0,l*2)]'
10000000 loops, best of 3: 0.0343 usec per loop
jv@Pioneer:~$ python -m timeit -s 'jokes=range(1000000);l=len(jokes)*2;domain=[(0,l-i-1) for i in range(0,l)]'
10000000 loops, best of 3: 0.0333 usec per loop

Observing the marginal difference 2.55% between the first and the second made me think - is the first list comprehension

domain=[(0,(len(jokes)*2)-i-1) for i in range(0,len(jokes)*2)]

optimized internally by python? or is 2.55% a big enough optimization (given that the len(jokes)=1000000)?

If this is - What are the other implicit/internal optimizations in Python ?

What are the developer's rules of thumb for optimization in Python?

Edit1: Since most of the answers are "don't optimize, do it later if its slow" and I got some tips and links from Triptych and Ali A for the do's. I will change the question a bit and request for don'ts.

Can we have some experiences from people who faced the 'slowness', what was the problem and how it was corrected?

Edit2: For those who haven't here is an interesting read

Edit3: Incorrect usage of timeit in question please see dF's answer for correct usage and hence timings for the three codes.

897

asked Dec 31 '08 18:12

JV.

1 Answers

You're not using timeit correctly: the argument to -s (setup) is a statement to be executed once initially, so you're really just testing an empty statement. You want to do

$ python -m timeit -s "jokes=range(1000000)" "domain=[(0,(len(jokes)*2)-i-1) for i in range(0, len(jokes)*2)]"
10 loops, best of 3: 1.08 sec per loop
$ python -m timeit -s "jokes=range(1000000)" "l=len(jokes);domain=[(0,(l*2)-i-1) for i in range(0, l*2)]"
10 loops, best of 3: 908 msec per loop
$ python -m timeit -s "jokes=range(1000000)" "l=len(jokes*2);domain=[(0,l-i-1) for i in range(0, l)]"
10 loops, best of 3: 813 msec per loop

While the speedup is still not dramatic, it's more significant (16% and 25% respectively). So since it doesn't make the code any more complicated, this simple optimization is probably worth it.

To address the actual question... the usual rule of thumb in Python is to

Favor straightforward and readable code over optimization when coding.
Profile your code (profile / cProfile and pstats are your friends) to figure out what you need to optimize (usually things like tight loops).
As a last resort, re-implement these as C extensions, which is made much easier with tools like pyrex and cython.

One thing to watch out for: compared to many other languages, function calls are relatively expensive in Python which is why the optimization in your example made a difference even though len is O(1) for lists.

166

answered Oct 07 '22 06:10

dF.

Related questions
                            
                                python: "with" syntax for opening files with two functions
                            
                                Adding image to pandas DataFrame
                            
                                TensorFlow library was compiled to use SSE4.1 instructions, but these aren't available on your machine. Aborted (core dumped)
                            
                                pip install pandas failed because of environment error
                            
                                How to convert a timedelta to minutes?
                            
                                How to zip two array columns in Spark SQL
                            
                                Appending to file using savetxt
                            
                                How can I wait for an object's __del__ to finish before the async loop closes?
                            
                                App Engine, pymongo.errors.ServerSelectionTimeoutError: connection closed,connection closed,connection closed"
                            
                                Seaborn plot with second y axis
                            
                                What is the negative mean absolute error in scikit-learn?
                            
                                How do I run a cronjob with a python virtual environment?
                            
                                How to run pdb inside a Docker Container
                            
                                Unable to create process using '"c:\bld\scrapy_1564674375870\_h_env\python.exe"
                            
                                How to add caption & subtitle using plotly method in python
                            
                                'Sequential' object has no attribute '_in_multi_worker_mode'
                            
                                pip gives error "pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available." in mac
                            
                                Any difference between str.capitalize() and str.title()?
                            
                                PIP Install: Cannot combine --user and --target
                            
                                How to quickly compare two text files and get unique rows?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Optimization in Python - do's, don'ts and rules of thumb

Tags:

python

optimization

JV.

People also ask

1 Answers

dF.

Recent Activity

Donate For Us