I want to measure the number of clock cycles it takes to do an addition operation in Python 3.
I wrote a program to calculate the average value of the addition operation:
from timeit import timeit
def test(n):
for i in range(n):
1 + 1
if __name__ == '__main__':
times = {}
for i in [2 ** n for n in range(10)]:
t = timeit.timeit("test(%d)" % i, setup="from __main__ import test", number=100000)
times[i] = t
print("%d additions takes %f" % (i, t))
keys = sorted(list(times.keys()))
for i in range(len(keys) - 2):
print("1 addition takes %f" % ((times[keys[i+1]] - times[keys[i]]) / (keys[i+1] - keys[i])))
Output:
16 additions takes 0.288647
32 additions takes 0.422229
64 additions takes 0.712617
128 additions takes 1.275438
256 additions takes 2.415222
512 additions takes 5.050155
1024 additions takes 10.381530
2048 additions takes 21.185604
4096 additions takes 43.122559
8192 additions takes 88.323853
16384 additions takes 194.353927
1 addition takes 0.008292
1 addition takes 0.010068
1 addition takes 0.008654
1 addition takes 0.010318
1 addition takes 0.008349
1 addition takes 0.009075
1 addition takes 0.008794
1 addition takes 0.008905
1 addition takes 0.010293
1 addition takes 0.010413
1 addition takes 0.010551
1 addition takes 0.010711
1 addition takes 0.011035
So according to this output one addition takes approximately 0.0095 usecs. Following this page instructions I calculated that one addition takes 25 CPU cycles. Is this a normal value and why? Because assembly instruction ADD only takes 1-2 CPU cycles.
You are timing a function call (test()
), a for
loop, and a call to range()
. The addition is not being timed at all.
def test(n):
for i in range(n):
1 + 1
import dis
dis.dis(test)
Here is the byte code for your test function (does not include the call to test()
):
2 0 SETUP_LOOP 24 (to 27)
3 LOAD_GLOBAL 0 (range)
6 LOAD_FAST 0 (n)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 10 (to 26)
16 STORE_FAST 1 (i)
3 19 LOAD_CONST 2 (2) ****
22 POP_TOP
23 JUMP_ABSOLUTE 13
>> 26 POP_BLOCK
>> 27 LOAD_CONST 0 (None)
30 RETURN_VALUE
**** Notice, the addition is done at compilation time. Quite a few other languages and their compilers will do that, including C. However, the standards rarely define when a 1 + 1
is actually done, so it is often implementation dependant.
EDIT:
Your timeit
function call could be this:
t = timeit("x += 1", setup="x = 1", number=100000)
We can create a dummy function to check the operation:
def myfunc(x):
x += 1
import dis
dis.dis(myfunc)
Making that change gives:
1 additions takes 0.008976
2 additions takes 0.007419
4 additions takes 0.007282
8 additions takes 0.007693
16 additions takes 0.007026
32 additions takes 0.007793
64 additions takes 0.010168
128 additions takes 0.008124
256 additions takes 0.009064
512 additions takes 0.007256
1 addition takes -0.001557
1 addition takes -0.000068
1 addition takes 0.000103
1 addition takes -0.000083
1 addition takes 0.000048
1 addition takes 0.000074
1 addition takes -0.000032
1 addition takes 0.000007
26 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_FAST 0 (x)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
Note that x += 1
is an INPLACE_ADD
, different to x = x + 1
which is a BINARY_ADD
. So you need to decide which OPCode you want to measure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With