Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: why are * and ** faster than / and sqrt()?

While optimising my code I realised the following:

>>> from timeit import Timer as T >>> T(lambda : 1234567890 / 4.0).repeat() [0.22256922721862793, 0.20560789108276367, 0.20530295372009277] >>> from __future__ import division >>> T(lambda : 1234567890 / 4).repeat() [0.14969301223754883, 0.14155197143554688, 0.14141488075256348] >>> T(lambda : 1234567890 * 0.25).repeat() [0.13619112968444824, 0.1281130313873291, 0.12830305099487305] 

and also:

>>> from math import sqrt >>> T(lambda : sqrt(1234567890)).repeat() [0.2597470283508301, 0.2498021125793457, 0.24994492530822754] >>> T(lambda : 1234567890 ** 0.5).repeat() [0.15409398078918457, 0.14059877395629883, 0.14049601554870605] 

I assume it has to do with the way python is implemented in C, but I wonder if anybody would care to explain why is so?

like image 353
mac Avatar asked Nov 09 '11 16:11

mac


People also ask

Is sqrt slow?

sqrt() is "slow," not slow. Fundamentally, the operation is more complex than an addition or division.

Is math sqrt faster than Numpy sqrt?

It turns out that the sqrt() function from the standard Python math module is about seven times faster than the corresponding sqrt() function from numpy. As a side note, I learned that it is slightly faster (5-10%) to use the form “from math import sqrt” than it is to use “import math” and “math. sqrt()”.

How does square root work in Python?

sqrt() function is an inbuilt function in Python programming language that returns the square root of any number. Syntax: math.sqrt(x) Parameter: x is any number such that x>=0 Returns: It returns the square root of the number passed in the parameter.

What is NP sqrt?

numpy. sqrt(array[, out]) function is used to determine the positive square-root of an array, element-wise. Syntax: numpy.sqrt()


1 Answers

The (somewhat unexpected) reason for your results is that Python seems to fold constant expressions involving floating-point multiplication and exponentiation, but not division. math.sqrt() is a different beast altogether since there's no bytecode for it and it involves a function call.

On Python 2.6.5, the following code:

x1 = 1234567890.0 / 4.0 x2 = 1234567890.0 * 0.25 x3 = 1234567890.0 ** 0.5 x4 = math.sqrt(1234567890.0) 

compiles to the following bytecodes:

  # x1 = 1234567890.0 / 4.0   4           0 LOAD_CONST               1 (1234567890.0)               3 LOAD_CONST               2 (4.0)               6 BINARY_DIVIDE                      7 STORE_FAST               0 (x1)    # x2 = 1234567890.0 * 0.25   5          10 LOAD_CONST               5 (308641972.5)              13 STORE_FAST               1 (x2)    # x3 = 1234567890.0 ** 0.5   6          16 LOAD_CONST               6 (35136.418286444619)              19 STORE_FAST               2 (x3)    # x4 = math.sqrt(1234567890.0)   7          22 LOAD_GLOBAL              0 (math)              25 LOAD_ATTR                1 (sqrt)              28 LOAD_CONST               1 (1234567890.0)              31 CALL_FUNCTION            1              34 STORE_FAST               3 (x4) 

As you can see, multiplication and exponentiation take no time at all since they're done when the code is compiled. Division takes longer since it happens at runtime. Square root is not only the most computationally expensive operation of the four, it also incurs various overheads that the others do not (attribute lookup, function call etc).

If you eliminate the effect of constant folding, there's little to separate multiplication and division:

In [16]: x = 1234567890.0  In [17]: %timeit x / 4.0 10000000 loops, best of 3: 87.8 ns per loop  In [18]: %timeit x * 0.25 10000000 loops, best of 3: 91.6 ns per loop 

math.sqrt(x) is actually a little bit faster than x ** 0.5, presumably because it's a special case of the latter and can therefore be done more efficiently, in spite of the overheads:

In [19]: %timeit x ** 0.5 1000000 loops, best of 3: 211 ns per loop  In [20]: %timeit math.sqrt(x) 10000000 loops, best of 3: 181 ns per loop 

edit 2011-11-16: Constant expression folding is done by Python's peephole optimizer. The source code (peephole.c) contains the following comment that explains why constant division isn't folded:

    case BINARY_DIVIDE:         /* Cannot fold this operation statically since            the result can depend on the run-time presence            of the -Qnew flag */         return 0; 

The -Qnew flag enables "true division" defined in PEP 238.

like image 100
12 revs Avatar answered Oct 13 '22 05:10

12 revs