Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a statistical profiler for python? If not, how could I go about writing one?

I would need to run a python script for some random amount of time, pause it, get a stack traceback, and unpause it. I've googled around for a way to do this, but I see no obvious solution.

like image 991
shino Avatar asked Apr 11 '11 03:04

shino


People also ask

How do you do profiling in python?

Method 3: Python cProfile Python includes a built-in module called cProfile which is used to measure the execution time of a program. The cProfiler module provides all information about how long the program is executing and how many times the function gets called in a program.

Which tool will Analyse the data collected by the python profiler?

Python Profilers, like cProfile helps to find which part of the program or code takes more time to run. This article will walk you through the process of using cProfile module for extracting profiling data, using the pstats module to report it and snakeviz for visualization.

How do you use line profiler in python?

The line_profiler test cases (found on GitHub) have an example of how to generate profile data from within a Python script. You have to wrap the function that you want to profile and then call the wrapper passing any desired function arguments. Also, you can add additional functions to be profiled as well.

How do you use Pyinstrument?

Call Pyinstrument directly from the command line. Instead of writing python script.py , type pyinstrument script.py . Your script will run as normal, and at the end (or when you press ^C ), Pyinstrument will output a colored summary showing where most of the time was spent.


2 Answers

There's the statprof module

pip install statprof (or easy_install statprof), then to use:

import statprof

statprof.start()
try:
    my_questionable_function()
finally:
    statprof.stop()
    statprof.display()

There's a bit of background on the module from this blog post:

Why would this matter, though? Python already has two built-in profilers: lsprof and the long-deprecated hotshot. The trouble with lsprof is that it only tracks function calls. If you have a few hot loops within a function, lsprof is nearly worthless for figuring out which ones are actually important.

A few days ago, I found myself in exactly the situation in which lsprof fails: it was telling me that I had a hot function, but the function was unfamiliar to me, and long enough that it wasn’t immediately obvious where the problem was.

After a bit of begging on Twitter and Google+, someone pointed me at statprof. But there was a problem: although it was doing statistical sampling (yay!), it was only tracking the first line of a function when sampling (wtf!?). So I fixed that, spiffed up the documentation, and now it’s both usable and not misleading. Here’s an example of its output, locating the offending line in that hot function more accurately:

  %   cumulative      self          
 time    seconds   seconds  name    
 68.75      0.14      0.14  scmutil.py:546:revrange
  6.25      0.01      0.01  cmdutil.py:1006:walkchangerevs
  6.25      0.01      0.01  revlog.py:241:__init__
  [...blah blah blah...]
  0.00      0.01      0.00  util.py:237:__get__
---
Sample count: 16
Total time: 0.200000 seconds

I have uploaded statprof to the Python package index, so it’s almost trivial to install: "easy_install statprof" and you’re up and running.

Since the code is up on github, please feel welcome to contribute bug reports and improvements. Enjoy!

like image 125
dbr Avatar answered Sep 27 '22 19:09

dbr


I can think of a couple of few ways to do this:

  • Rather than trying to get a stack trace while the program is running, just fire an interrupt at it, and parse the output. You could do this with a shell script or with another python script that invokes your app as a subprocess. The basic idea is explained and rather thoroughly defended in this answer to a C++-specific question.

    • Actually, rather than having to parse the output, you could register a postmortem routine (using sys.excepthook) that logs the stack trace. Unfortunately, Python doesn't have any way to continue from the point at which an exception occurred, so you can't resume execution after logging.
  • In order to actually get a stack trace from a running program, you will may have to hack the implementation. So if you really want to do that, it may be worth your time to check out pypy, a Python implementation written mostly in Python. I've no idea how convenient it would be to do this in pypy. I'm guessing that it wouldn't be particularly convenient, since it would involve introducing a hook into basically every instruction, which would I think be prohibitively inefficient. Also, I don't think there will be much advantage over the first option, unless it takes a very long time to reach the state where you want to start doing stack traces.

  • There exists a set of macros for the gdb debugger intended to facilitate debugging Python itself. gdb can attach to an external process (in this case the instance of python which is executing your application) and do, well, pretty much anything with it. It seems that the macro pystack will get you a backtrace of the Python stack at the current point of execution. I think it would be pretty easy to automate this procedure, since you can (at worst) just feed text into gdb using expect or whatever.

like image 40
intuited Avatar answered Sep 27 '22 20:09

intuited