Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does trivial loop in python run so much slower than the same in C++? And how to optimize that? [duplicate]

simply run a near empty for loop in python and in C++ (as following), the speed are very different, the python is more than a hundred times slower.

a = 0
for i in xrange(large_const):
  a += 1
int a = 0;
for (int i = 0; i < large_const; i++)
  a += 1;

Plus, what can I do to optimize the speed of python?

(Addition: I made a bad example here in the first version of this question, I don't really mean that a=1 so that C/C++ compiler could optimize that, I mean the loop itself consumed a lot of resource (maybe I should use a+=1 as example).. And what I mean by how to optimize is that if the for loop is just like a += 1 that simple, how could it be run in the similar speed as C/C++? In my practice, I used Numpy so I can't use pypy anymore (for now), is there some general methods for making loop far more quickly (such as generator in generating list)? )

like image 743
chentingpc Avatar asked Jun 03 '13 14:06

chentingpc


Video Answer


1 Answers

A smart C compiler can probably optimize your loop away by recognizing that at the end, a will always be 1. Python can't do that because when iterating over xrange, it needs to call __next__ on the xrange object until it raises StopIteration. python can't know if __next__ will have side-effect until it calls it, so there is no way to optimize the loop away. The take-away message from this paragraph is that it is MUCH HARDER to optimize a Python "compiler" than a C compiler because python is such a dynamic language and requires the compiler to know how the object will behave in certain circumstances. In C, that's much easier because C knows exactly what type every object is ahead of time.

Of course, compiler aside, python needs to do a lot more work. In C, you're working with base types using operations supported in hardware instructions. In python, the interpreter is interpreting the byte-code one line at a time in software. Clearly that is going to take longer than machine level instructions. And the data model (e.g. calling __next__ over and over again) can also lead to a lot of function calls which the C doesn't need to do. Of course, python does this stuff to make it much more flexible than you can have in a compiled language.

The typical way to speed up python code is to use libraries or intrinsic functions which provide a high level interface to low-level compiled code. scipy and numpy are excellent examples this kind of library. Other things you can look into are using pypy which includes a JIT compiler -- you probably won't reach native speeds, but it'll probably beat Cpython (the most common implementation), or writing extensions in C/fortran using the Cpython-API, cython or f2py for performance critical sections of code.

like image 124
mgilson Avatar answered Sep 29 '22 16:09

mgilson