I have a large code base, and something is now taking too long to execute. I have no idea what.
The code never raises an exception, it just appears to keep on processing something.
What I'd like to do is place timers around some functions to test which one is the culprit. But I'm not sure if that is the right approach, or how to do it.
I can't easily raise exceptions at points around the code, because there are various loops that call the same functions, that perhaps only sometimes take too long.
What is the best strategy?
One solution is to use cProfile, which comes built into Python, to tell what functions your code is spending the most time in. Critically, this profiling works even if you stop your code with a KeyboardInterrupt. Thus, you can start your code running and profiling, stop it after a minute or two, and then see where it was spending its time.
Run your code with these extra -m and -o arguments:
python -m cProfile -o profile.txt myscript.py
and then once the program has finished running, run the following code (from another script, for example):
import pstats
p = pstats.Stats('profile.txt')
p.strip_dirs().sort_stats("time").print_stats()
This will print a list of the functions sorted by the total time you spent in them.
Here's a demonstration of using profiling to debug an infinite loop. Let's say myscript.py had the following code.
def f():
while True:
g(100000)
def g(n):
x = []
for i in range(n):
x.append(n)
f()
Of course this causes an infinite loop- g will be run many, many times. So I run the profiling command above, but then I stop it after about 30-40 seconds (could be much shorter even). Its profile will then be printed as:
Wed Jan 30 10:58:50 2013 profile.txt
115414854 function calls in 37.705 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1155 25.787 0.022 37.020 0.032 test3.py:5(g)
115412541 10.060 0.000 10.060 0.000 {method 'append' of 'list' objects}
1155 1.173 0.001 1.173 0.001 {range}
1 0.685 0.685 37.705 37.705 test3.py:1(f)
1 0.000 0.000 37.705 37.705 test3.py:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Notice that our infinitely looping function g is right at the top of the list in terms of time spent in the function.
Note: Just because the code is spending all its time in one function doesn't mean that the loop is directly around that function- it could be called by a function that is called by a function (etc) that is in an infinite loop (notice that append is near the top of the list since it is called inside g). One alternative trick is to sort them according to the cumulative time spent in each function, using .sort_stats("cum"). A combination of these two approaches, along with a little detective work (looking at the code and adding debug messages), should be able to identify the culprit.
I would suggest using the logging module to add debug statements to various points in the code. Using the logging module allows you to easily turn on and off the debugging statements as well as control what information is displayed.
Another option is to use an IDE where you can easily add break points, allowing you to figure out where the code goes into the long loop.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With