Optimizing Python Code [closed]

Tags:

I've been working on one of the coding challenges on InterviewStreet.com and I've run into a bit of an efficiency problem. Can anyone suggest where I might change the code to make it faster and more efficient?

Here's the code

Here's the problem statement if you're interested

783

asked Aug 23 '11 18:08

James Brewer

4 Answers

If your question is about optimising python code generally (which I think it should be ;) then there are all sorts of intesting things you can do, but first:

You probably shouldn't be obsessively optimising python code! If you're using the fastest algorithm for the problem you're trying to solve and python doesn't do it fast enough you should probably be using a different language.

That said, there are several approaches you can take (because sometimes, you really do want to make python code faster):

Profile (do this first!)

There are lots of ways of profiling python code, but there are two that I'll mention: cProfile (or profile) module, and PyCallGraph.

cProfile

This is what you should actually use, though interpreting the results can be a bit daunting. It works by recording when each function is entered or exited, and what the calling function was (and tracking exceptions).

You can run a function in cProfile like this:

Click to copy

import cProfile
cProfile.run('myFunction()', 'myFunction.profile')

Then to view the results:

Click to copy

import pstats
stats = pstats.Stats('myFunction.profile')
stats.strip_dirs().sort_stats('time').print_stats()

This will show you in which functions most of the time is spent.

PyCallGraph

PyCallGraph provides a prettiest and maybe the easiest way of profiling python programs -- and it's a good introduction to understanding where the time in your program is spent, however it adds significant execution overhead

To run pycallgraph:

Click to copy

pycallgraph graphviz ./myprogram.py

Simple! You get a png graph image as output (perhaps after a while...)

Use Libraries

If you're trying to do something in python that a module already exists for (maybe even in the standard library), then use that module instead!

Most of the standard library modules are written in C, and they will execute hundreds of times faster than equivilent python implementations of, say, bisection search.

Make the Interpreter do as Much of Your Work as You Can

The interpreter will do some things for you, like looping. Really? Yes! You can use the map, reduce, and filter keywords to significantly speed up tight loops:

consider:

Click to copy

for x in xrange(0, 100):
    doSomethingWithX(x)

vs:

Click to copy

map(doSomethingWithX, xrange(0,100))

Well obviously this could be faster because the interpreter only has to deal with a single statement, rather than two, but that's a bit vague... in fact, this is faster for two reasons:

all flow control (have we finished looping yet...) is done in the interpreter
the doSomethingWithX function name is only resolved once

In the for loop, each time around the loop python has to check exactly where the doSomethingWithX function is! even with cacheing this is a bit of an overhead.

Remember that Python is an Interpreted Language

(Note that this section really is about tiny tiny optimisations that you shouldn't let affect your normal, readable coding style!) If you come from a background of a programming in a compiled language, like c or Fortran, then some things about the performance of different python statements might be surprising:

`try:`ing is cheap, `if`ing is expensive

If you have code like this:

Click to copy

if somethingcrazy_happened:
     uhOhBetterDoSomething()
else:
     doWhatWeNormallyDo()

And doWhatWeNormallyDo() would throw an exception if something crazy had happened, then it would be faster to arrange your code like this:

Click to copy

try:
    doWhatWeNormallyDo()
except SomethingCrazy:
    uhOhBetterDoSomething()

Why? well the interpreter can dive straight in and start doing what you normally do; in the first case the interpreter has to do a symbol look up each time the if statement is executed, because the name could refer to something different since the last time the statement was executed! (And a name lookup, especially if somethingcrazy_happened is global can be nontrivial).

You mean Who??

Because of cost of name lookups it can also be better to cache global values within functions, and bake-in simple boolean tests into functions like this:

Unoptimised function:

Click to copy

def foo():
    if condition_that_rarely_changes:
         doSomething()
    else:
         doSomethingElse()

Optimised approach, instead of using a variable, exploit the fact that the interpreter is doing a name lookup on the function anyway!

When the condition becomes true:

Click to copy

foo = doSomething # now foo() calls doSomething()

When the condition becomes false:

Click to copy

foo = doSomethingElse # now foo() calls doSomethingElse()

PyPy

PyPy is a python implementation written in python. Surely that means it will run code infinitely slower? Well, no. PyPy actually uses a Just-In-Time compiler (JIT) to run python programs.

If you don't use any external libraries (or the ones you do use are compatible with PyPy), then this is an extremely easy way to (almost certainly) speed up repetitive tasks in your program.

Basically the JIT can generate code that will do what the python interpreter would, but much faster, since it is generated for a single case, rather than having to deal with every possible legal python expression.

Where to look Next

Of course, the first place you should have looked was to improve your algorithms and data structures, and to consider things like caching, or even whether you need to be doing so much in the first place, but anyway:

This page of the python.org wiki provides lots of information about how to speed up python code, though some of it is a bit out of date.
Here's the BDFL himself on the subject of optimising loops.

There are quite a few things, even from my own limited experience that I've missed out, but this answer was long enough already!

This is all based on my own recent experiences with some python code that just wasn't fast enough, and I'd like to stress again that I don't really think any of what I've suggested is actually a good idea, sometimes though, you have to....

123

answered Oct 06 '22 10:10

James

First off, profile your code so you know where the problems lie. There are many examples of how to do this, here's one: https://codereview.stackexchange.com/questions/3393/im-trying-to-understand-how-to-make-my-application-more-efficient

You do a lot of indexed access as in:

Click to copy

for pair in range(i-1, j):
    if coordinates[pair][0] >= 0 and coordinates[pair][1] >= 0:

Which could be written more plainly as:

Click to copy

for coord in coordinates[i-1:j]:
    if coord[0] >= 0 and cood[1] >= 0:

List comprehensions are cool and "pythonic", but this code would probably run faster if you didn't create 4 lists:

Click to copy

N = int(raw_input())
coordinates = []
coordinates = [raw_input() for i in xrange(N)]
coordinates = [pair.split(" ") for pair in coordinates]
coordinates = [[int(pair[0]), int(pair[1])] for pair in coordinates]

I would instead roll all those together into one simple loop or if you're really dead set on list comprehensions, encapsulate the multiple transformations into a function which operates on the raw_input().

answered Oct 06 '22 10:10

John Gaines Jr.

This answer shows how I locate code to optimize. Suppose there is some line of code you could replace, and it is costing, say, 40% of the time. Then it resides on the call stack 40% of the time. If you take 10 samples of the call stack, it will appear on 4 of them, give or take. It really doesn't matter how many samples show it. If it appears on two or more, and if you can replace it, you will save whatever time it costs.

answered Oct 06 '22 08:10

Mike Dunlavey

Most of the interview street problems seem to be tested in a way that will verify that you have found an algorithm with the right big O complexity rather than that you have coded the solution in the most optimal way possible.

In other words if you are failing some of the test cases due to running out of time the problem is likely that you need to figure out a solution with lower algorithmic complexity rather than micro-optimize the algorithm you have. This is why they generally state that N can be quite large.

answered Oct 06 '22 10:10

mattnewport

Related questions
                            
                                Get enum definition from shared library
                            
                                openCV: Sobel edge detection gives me assertion error
                            
                                Are there any reasons not to mix Multiprocessing and Threading module in Python
                            
                                django object ids increment between unit tests
                            
                                Representing a network in Python
                            
                                In test cases(unit-testing), Django pre_save signal can not be caught
                            
                                RSA encryption in python
                            
                                Problem with iFrames in Selenium
                            
                                How to use Popen to run backgroud process and avoid zombie?
                            
                                Numpy array from cStringIO object and avoiding copies
                            
                                Dependency Injection to modules
                            
                                How to add constant-spaced ticks on axes whose lenghts vary? [Python]
                            
                                Parallel programming with coroutines in Python
                            
                                Get the directory of a Shortcut calling a Python Script
                            
                                How do I use splines in pythonOCC?
                            
                                python datetime strptime wildcard
                            
                                How to install pip for Python 2
                            
                                Where can the documentation for python-Levenshtein be found online? [closed]
                            
                                Is there a clever way to get the previous/next item using the Django ORM?
                            
                                Python setuptools/distutils custom build for the `extra` package with Makefile

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Optimizing Python Code [closed]

Tags:

performance

python

optimization

James Brewer

People also ask

4 Answers

Profile (do this first!)

cProfile

PyCallGraph

Use Libraries

Make the Interpreter do as Much of Your Work as You Can

Remember that Python is an Interpreted Language

`try:`ing is cheap, `if`ing is expensive

You mean Who??

PyPy

Where to look Next

James

John Gaines Jr.

Mike Dunlavey

mattnewport

Recent Activity

Donate For Us

Optimizing Python Code [closed]

Tags:

performance

python

optimization

James Brewer

People also ask

4 Answers

Profile (do this first!)

cProfile

PyCallGraph

Use Libraries

Make the Interpreter do as Much of Your Work as You Can

Remember that Python is an Interpreted Language

try:ing is cheap, ifing is expensive

You mean Who??

PyPy

Where to look Next

James

John Gaines Jr.

Mike Dunlavey

mattnewport

Related questions

Recent Activity

Donate For Us

`try:`ing is cheap, `if`ing is expensive