Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

As of June 2014, what tools should one consider for improving Python code performance? [closed]

Tags:

python

numpy

I have written a small scientific experiment in Python, and I now need to consider optimizing this code. After profiling, what tools should I consider in order to improve performance. From my understanding, the following wouldn't work:

Psyco: out of date (doesn't support Python 2.7)

Pyrex: last update was in 2010

Pypy: has issues with NumPy

What options remain now apart from writing C modules and then somehow interfacing them with Python (for example, by using Cython)?

like image 235
bzm3r Avatar asked Jan 31 '26 15:01

bzm3r


1 Answers

You can use Cython to compile the bottlenecks to C. This is very effective for numerical code where you have tight loops. Python loops add quite a lot of overhead, that is non-existent if you can translate things to pure C. In general, you can get very good performance for any statically typed code (that is, your types do not change, and you can annotate them on the source).

You can also write the core parts of your algorithm in C (or take an already written library) and wrap it. You can still do it writing a lot of boilerplate code with Cython or SWIG, but now there are tools like XDress that can do this for you. If you are a FORTRAN person, f2py is your tool.

Modern CPUs have many cores, so you should be able to take advantage of it usin Python's multiprocessing. The guys at joblib have provided a very nice and simplified interface for it.

Some problems are also suitable for GPU computing when you can use PyCUDA.

Theano is a library that is a bridge between Numpy, Cython, Sympy, and PyCUDA. It can evaluate and compile expressions and generate GPU kernels.

Lastly, there is the future, with Numba and Blaze. Numba is a JIT compiler based on LLVM. The development is not complete, as some syntax is missing and bugs are quite common. I don't believe it is ready for production code, unless you are sure your codebase is fully supported and you have a very good test coverage. Blaze is a next generation Numpy, with support for out of core storage and more flexible arrays; and designed to use Numba as a backend to speed up execution. It is in a quite early stage of development.

Regarding your options:

  • Pysco: the author considered the project was done, and he decided to collaborate with Pypy. Most of its features are in there now.
  • Pyrex: abandoned project, where Cython was forked from. It has all its features and much more.
  • Pypy: not a real option for general scientific code because the interfacing with C is too slow, and not complete. Numpy is only partially suported, and there is little hope Scipy will ever be (mainly because of the FORTRAN dependencies). This may change in the future, but probably not any time soon. Not being able to fully use C extensions limits very much the possibilities for using external code. I must add I have used it successfully with Networkx (pure Python networks library), so there are use cases where it could be of use.
like image 118
Davidmh Avatar answered Feb 02 '26 06:02

Davidmh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!