I thought negating a word or XOR a word with 1
both should take 1 clock cycle of the processor, but if using Python3, the following code to count the parity of a word:
def parity(x: int) -> int:
p = 0
while x:
p = ~p
x = x & (x-1)
return p & 1
running against 10,000 test cases would report 4μs
average running time consistently, but if it is:
def parity(x: int) -> int:
p = 0
while x:
p = p ^ 1
x = x & (x-1)
return p
then it is consistently 5μs
. It actually has one less operation at the end not needing to & 1
.
Even
def parity(x: int) -> int:
p = 0
while x:
p += 1
x = x & (x-1)
return p % 2
is consistently running for 4μs
. To test increment vs addition, I changed the p += 1
to p += 7
and it is again consistently 4μs
. What is the reason negating, increment, or addition faster than XOR using Python3? (it is on a Mac if it makes any difference).
As it turns out, in Python, not all operators are created equally.
You mention "clock cycle"s in your question. It is true that a typical CPU can do many of these tasks in one cycle, however, execution of Python code (in CPython), is dictated by the compiled bytecode. This is a list of operations (op codes), which are mapped to C code in the CPython runtime.
The execution time of these portions of C code obviously may differ. In fact, you'll find that none of them associate with a "1 cycle" operation, but rather multiple function calls and branching statements.
If you disassemble the 2 functions you have given, dis.dis(parity)
, you will see how their op codes differ in the relevant section.
From the first function (negate)
10 LOAD_FAST 1 (p)
12 UNARY_INVERT
14 STORE_FAST 1 (p)
From the second function (XOR):
10 LOAD_FAST 1 (p)
12 LOAD_CONST 2 (1)
14 BINARY_XOR
16 STORE_FAST 1 (p)
Most notably, the UNARY_INVERT
is changed to a BINARY_XOR
. Check the master switch (opcode) { ... }
in Python/ceval.c to see how these op codes and their corresponding code differ.
Here is a small test program to compare the time of some of the different operators in Python.
import timeit
print('binary operators')
ops = ['+', '-', '^', '&', '|']
for op in ops:
t = timeit.timeit(f'a = a {op} b', 'a = 1; b = 2')
print(op, t)
print('unary operators')
ops = ['-', '~', 'not']
for op in ops:
t = timeit.timeit(f'a = {op} a', 'a = 1')
print(op, t)
Here are my results in CPython 3.7.2 64 bit (Windows)
binary operators
+ 0.038782999999999956
- 0.03799190000000002
^ 0.05287609999999998
& 0.03716779999999997
| 0.0518267
unary operators
- 0.020763399999999987
~ 0.020213900000000007
not 0.01701140000000001
From these results, it seems ^
or BINARY_XOR
is in fact the slowest operator (of those tested)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With