I can't believe what I just measured:
python3 -m timeit -s "from math import sqrt" "sqrt(2)"
5000000 loops, best of 5: 42.8 nsec per loop
python3 -m timeit "2 ** 0.5"
50000000 loops, best of 5: 4.93 nsec per loop
This goes against any intuition... it shoud be exactly the opposite!
Python 3.8.3 on macOS Catalina
For scalar math, Python functions are faster than numpy functions. It turns out that the sqrt() function from the standard Python math module is about seven times faster than the corresponding sqrt() function from numpy.
sqrt() function is an inbuilt function in Python programming language that returns the square root of any number. Syntax: math. sqrt(x) Parameter: x is any number such that x>=0 Returns: It returns the square root of the number passed in the parameter.
Python 3 is precomputing the value of 2 ** 0.5
at compile time, since both operands are known at that time. The value of sqrt
, however, is not known at compile time, so the computation necessarily occurs at run time.
You aren't timing how long it takes to compute 2 ** 0.5
, but just the time it takes to load a constant.
A fairer comparison would be
$ python3 -m timeit -s "from math import sqrt" "sqrt(2)"
5000000 loops, best of 5: 50.7 nsec per loop
$ python3 -m timeit -s "x = 2" "x**0.5"
5000000 loops, best of 5: 56.7 nsec per loop
I'm not sure if there is a way to show unoptimized byte code. Python starts by parsing source code into an abstract syntax tree (AST):
>>> ast.dump(ast.parse("2**0.5"))
'Module(body=[Expr(value=BinOp(left=Num(n=2), op=Pow(), right=Num(n=0.5)))])'
Update: This particular optimization is now applied directly to the abstract syntax tree, so the byte code is generated directly from something like
Module(body=Num(n= 1.4142135623730951))
The ast
module doesn't appear to apply the optimization.
The compiler takes the AST and generates unoptimized byte code; in this case, I believe it would look (based on the output of dis.dis("2**x")
and dis.dis("x**0.5")
) like
LOAD_CONST 0 (2)
LOAD_CONST 1 (0.5)
BINARY_POWER
RETURN_VALUE
The raw byte code is then subject to modification by the peephole optimzizer, which can reduce these 4 instructions to 2, as shown by the dis
module.
The compiler then generates byte code from the AST.
>>> dis.dis("2**0.5")
1 0 LOAD_CONST 0 (1.4142135623730951)
2 RETURN_VALUE
[While the following paragraph was originally written with the idea of optimizing byte code in mind, the reasoning applies to optimizing the AST as well.]
Since nothing at runtime affects how the two LOAD_CONST
and following BINARY_POWER
instruction are evaluated (for example, there are no name lookups), the peephole optimizer can take this sequence of byte codes, perform the computation of 2**0.5
itself, and replace the first three instructions with a single LOAD_CONST
instruction that loads the result immediately.
To enhance chepner's answer, here's a proof:
Python 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> dis.dis('2 ** 0.5')
1 0 LOAD_CONST 2 (1.4142135623730951)
3 RETURN_VALUE
vs.
>>> dis.dis('sqrt(2)')
1 0 LOAD_NAME 0 (sqrt)
3 LOAD_CONST 0 (2)
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 RETURN_VALUE
>>> dis.dis('44442.3123 ** 0.5')
0 LOAD_CONST 0 (210.81345379268373)
2 RETURN_VALUE
I do not believe, that 44442.3123 ** 0.5
is precomputed at compile time. We should better check the AST of code.
>>> import ast
>>> import math
>>> code = ast.parse("2**2")
>>> ast.dump(code)
'Module(body=[Expr(value=BinOp(left=Num(n=2), op=Pow(), right=Num(n=2)))])'
>>> code = ast.parse("math.sqrt(3)")
>>> ast.dump(code)
"Module(body=[Expr(value=Call(func=Attribute(value=Name(id='math', ctx=Load()), attr='sqrt', ctx=Load()), args=[Num(n=3)], keywords=[]))])"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With