I was trying to optimize a program I'm tinkering with, when I noticed that doing value = i % 65536
appeared to be running slower then doing value = i % (2**16)
.
To test this, I ran the following program:
import cProfile
import pstats
AMOUNT = 100000000
def test1():
for i in xrange(AMOUNT):
value = i % 65536
return
def test2():
for i in xrange(AMOUNT):
value = i % (256**2)
return
def test3():
for i in xrange(AMOUNT):
value = i % (16**4)
return
def test4():
for i in xrange(AMOUNT):
value = i % (4**8)
return
def test5():
for i in xrange(AMOUNT):
value = i % (2**16)
return
def run_tests():
test1()
test2()
test3()
test4()
test5()
return
if __name__ == '__main__':
cProfile.run('run_tests()', 'results')
stats = pstats.Stats('results')
stats.sort_stats('calls', 'nfl')
stats.print_stats()
...which produced the following output:
Fri May 11 15:11:59 2012 results
8 function calls in 40.473 seconds
Ordered by: call count, name/file/line
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 40.473 40.473 <string>:1(<module>)
1 0.000 0.000 40.473 40.473 test.py:31(run_tests)
1 10.466 10.466 10.466 10.466 test.py:6(test1)
1 7.475 7.475 7.475 7.475 test.py:11(test2)
1 7.485 7.485 7.485 7.485 test.py:16(test3)
1 7.539 7.539 7.539 7.539 test.py:21(test4)
1 7.508 7.508 7.508 7.508 test.py:26(test5)
Using 65536
was the slowest at 10.466 seconds, while doing 256**2
was the fastest at 7.475 seconds (with the other possible exponent values falling in between). Granted, this difference in speed is only noticeable given high amounts of repetition, but I'm still curious as to why this occurs.
Why is taking the mod of a number by 65536
slower then taking the mod using exponents? They should evaluate to the same number, and I would have thought that it would take longer for the python interpreter to fully evaluate exponents before taking the mod.
By extension, is it generally more efficient to use powers of two in python expressions rather then fully typing the number out? And does this pattern hold true for operations besides modulus or for other numbers besides 2
?
(btw, I'm using Python 2.7.2 (32 bit), and I ran the above on a 64 bit Windows 7 laptop).
EDIT:
So I tried reversing the order of the functions I call, and now the opposite is true. It looks like whatever the first function is in run_tests
will always run a bit slower when using cProfile, which is weird. So, lesson learned, I guess -- profilers are weird :D
pow() is more efficient than chained multiplication at powers larger than 5 and always more efficient than the ** operator, so there is never a reason to use ** .
Modular exponentiation is exponentiation performed over a modulus. It is useful in computer science, especially in the field of public-key cryptography, where it is used in both Diffie-Hellman Key Exchange and RSA public/private keys.
There is no difference in the generated bytecode, because the compiler does its job well and optimizes away the constant arithmetic expression. That means your test results are just a coincidence (try timing the functions in a different order!).
>>> import dis
>>> dis.dis(test1)
2 0 SETUP_LOOP 30 (to 33)
3 LOAD_GLOBAL 0 (xrange)
6 LOAD_GLOBAL 1 (AMOUNT)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 16 (to 32)
16 STORE_FAST 0 (i)
3 19 LOAD_FAST 0 (i)
22 LOAD_CONST 1 (65536)
25 BINARY_MODULO
26 STORE_FAST 1 (value)
29 JUMP_ABSOLUTE 13
>> 32 POP_BLOCK
4 >> 33 LOAD_CONST 0 (None)
36 RETURN_VALUE
>>> dis.dis(test5)
2 0 SETUP_LOOP 30 (to 33)
3 LOAD_GLOBAL 0 (xrange)
6 LOAD_GLOBAL 1 (AMOUNT)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 16 (to 32)
16 STORE_FAST 0 (i)
3 19 LOAD_FAST 0 (i)
22 LOAD_CONST 3 (65536)
25 BINARY_MODULO
26 STORE_FAST 1 (value)
29 JUMP_ABSOLUTE 13
>> 32 POP_BLOCK
4 >> 33 LOAD_CONST 0 (None)
36 RETURN_VALUE
(well actually there is a difference: The number is stored at different offsets in the constant table. I can't imagine this causing any difference, though).
For completeness, here's a proper test that uses the timeit
module:
import timeit
setup = "i = 1337"
best1 = best2 = float("inf")
for _ in range(5000):
best1 = min(best1, timeit.timeit("i % 65536", setup=setup, number=10000))
for _ in range(5000):
best2 = min(best2, timeit.timeit("i % (2**16)", setup=setup, number=10000))
print best1
print best2
Note that I am measuring the minimum time needed, rather than the average. If it takes longer for some reason, this just means that it was interrupted more often (because the code doesn't depend on anything but the power of your CPU).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With