Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is taking the mod of a number in python faster with exponents?

I was trying to optimize a program I'm tinkering with, when I noticed that doing value = i % 65536 appeared to be running slower then doing value = i % (2**16).

To test this, I ran the following program:

import cProfile
import pstats

AMOUNT = 100000000

def test1():
    for i in xrange(AMOUNT):
        value = i % 65536
    return

def test2():
    for i in xrange(AMOUNT):
        value = i % (256**2)
    return

def test3():
    for i in xrange(AMOUNT):
        value = i % (16**4)
    return

def test4():
    for i in xrange(AMOUNT):
        value = i % (4**8)
    return

def test5():
    for i in xrange(AMOUNT):
        value = i % (2**16)
    return

def run_tests():
    test1()
    test2()
    test3()
    test4()
    test5()
    return

if __name__ == '__main__':
    cProfile.run('run_tests()', 'results')
    stats = pstats.Stats('results')
    stats.sort_stats('calls', 'nfl')
    stats.print_stats()

...which produced the following output:

Fri May 11 15:11:59 2012    results

         8 function calls in 40.473 seconds

   Ordered by: call count, name/file/line

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000   40.473   40.473 <string>:1(<module>)
        1    0.000    0.000   40.473   40.473 test.py:31(run_tests)
        1   10.466   10.466   10.466   10.466 test.py:6(test1)
        1    7.475    7.475    7.475    7.475 test.py:11(test2)
        1    7.485    7.485    7.485    7.485 test.py:16(test3)
        1    7.539    7.539    7.539    7.539 test.py:21(test4)
        1    7.508    7.508    7.508    7.508 test.py:26(test5)

Using 65536 was the slowest at 10.466 seconds, while doing 256**2 was the fastest at 7.475 seconds (with the other possible exponent values falling in between). Granted, this difference in speed is only noticeable given high amounts of repetition, but I'm still curious as to why this occurs.

Why is taking the mod of a number by 65536 slower then taking the mod using exponents? They should evaluate to the same number, and I would have thought that it would take longer for the python interpreter to fully evaluate exponents before taking the mod.

By extension, is it generally more efficient to use powers of two in python expressions rather then fully typing the number out? And does this pattern hold true for operations besides modulus or for other numbers besides 2?

(btw, I'm using Python 2.7.2 (32 bit), and I ran the above on a 64 bit Windows 7 laptop).

EDIT:
So I tried reversing the order of the functions I call, and now the opposite is true. It looks like whatever the first function is in run_tests will always run a bit slower when using cProfile, which is weird. So, lesson learned, I guess -- profilers are weird :D

like image 875
Michael0x2a Avatar asked May 11 '12 22:05

Michael0x2a


People also ask

Is POW faster than multiplication python?

pow() is more efficient than chained multiplication at powers larger than 5 and always more efficient than the ** operator, so there is never a reason to use ** .

What is modular exponentiation used for?

Modular exponentiation is exponentiation performed over a modulus. It is useful in computer science, especially in the field of public-key cryptography, where it is used in both Diffie-Hellman Key Exchange and RSA public/private keys.


1 Answers

There is no difference in the generated bytecode, because the compiler does its job well and optimizes away the constant arithmetic expression. That means your test results are just a coincidence (try timing the functions in a different order!).

>>> import dis
>>> dis.dis(test1)
  2           0 SETUP_LOOP              30 (to 33)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_GLOBAL              1 (AMOUNT)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                16 (to 32)
             16 STORE_FAST               0 (i)

  3          19 LOAD_FAST                0 (i)
             22 LOAD_CONST               1 (65536)
             25 BINARY_MODULO       
             26 STORE_FAST               1 (value)
             29 JUMP_ABSOLUTE           13
        >>   32 POP_BLOCK           

  4     >>   33 LOAD_CONST               0 (None)
             36 RETURN_VALUE        
>>> dis.dis(test5)
  2           0 SETUP_LOOP              30 (to 33)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_GLOBAL              1 (AMOUNT)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                16 (to 32)
             16 STORE_FAST               0 (i)

  3          19 LOAD_FAST                0 (i)
             22 LOAD_CONST               3 (65536)
             25 BINARY_MODULO       
             26 STORE_FAST               1 (value)
             29 JUMP_ABSOLUTE           13
        >>   32 POP_BLOCK           

  4     >>   33 LOAD_CONST               0 (None)
             36 RETURN_VALUE        

(well actually there is a difference: The number is stored at different offsets in the constant table. I can't imagine this causing any difference, though).

For completeness, here's a proper test that uses the timeit module:

import timeit

setup = "i = 1337"

best1 = best2 = float("inf")
for _ in range(5000):
  best1 = min(best1, timeit.timeit("i % 65536", setup=setup, number=10000))
for _ in range(5000):
  best2 = min(best2, timeit.timeit("i % (2**16)", setup=setup, number=10000))
print best1
print best2

Note that I am measuring the minimum time needed, rather than the average. If it takes longer for some reason, this just means that it was interrupted more often (because the code doesn't depend on anything but the power of your CPU).

like image 135
Niklas B. Avatar answered Sep 22 '22 16:09

Niklas B.