Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing for PyPy

(This is a follow-up to Statistical profiler for PyPy)

I'm running some Python code under PyPy and would like to optimize it.

In Python, I would use statprof or lineprofiler to know which exact lines are causing the slowdown and try to work around them. In PyPy though, both of the tools don't really report sensible results as PyPy might optimize away some lines. I would also prefer not to use cProfile as I find it very difficult to distil which part of the reported function is the bottleneck.

Does anyone have some tips on how to proceed? Perhaps another profiler which works nicely under PyPy? In general, how does one go about optimizing Python code for PyPy?

like image 967
Ecir Hana Avatar asked Jun 27 '13 17:06

Ecir Hana


1 Answers

If you understand the way the PyPy architecture works, you'll realize that trying to pinpoint individual lines of code isn't really productive. You start with a Python interpreter written in RPython, which then gets run through a tracing JIT which generates flow graphs and then transforms those graphs to optimize the RPython interpreter. What this means is the layout of your Python code being run by the RPython interpreter being JIT'ed may have very different structure than the optimized assembler actually be run. Furthermore, keep in mind that the JIT always works on a loop or a function, so getting line-by-line stats is not as meaningful. Consequently, I think cProfile may really be a good option for you since it will give you an idea of where to concentrate your optimization. Once you know which functions are your bottlenecks, you can spend your optimization efforts targeting those slower functions, rather than trying to fix a single line of Python code.

Keep in mind as you do this that PyPy has very different performance characteristics than cPython. Always try to write code in as simple a way as possible (that doesn't mean as few lines as possible btw). There are a few other heuristics that help such as using specialized lists, using objects over dicts when you have a small number of mostly constant keys, avoiding C extensions using the C Python API, etc.

If you really, really insist on trying to optimize at the line level. There are a few options. One is called JitViewer (https://foss.heptapod.net/pypy/jitviewer), which will let you have a very low level view of what the JIT is doing to your code. For instance, you can even see the assembler instructions which correspond to a Python loop. Using that tool, you can really get a sense for just how fast PyPy will behave with certain parts of the code, as you can now do silly things like count the number of assembler instructions used for your loop or something.

like image 119
jlund3 Avatar answered Sep 18 '22 15:09

jlund3