Why is taking the mod of a number in python faster with exponents?

Tags:

I was trying to optimize a program I'm tinkering with, when I noticed that doing value = i % 65536 appeared to be running slower then doing value = i % (2**16).

To test this, I ran the following program:

import cProfile
import pstats

AMOUNT = 100000000

def test1():
    for i in xrange(AMOUNT):
        value = i % 65536
    return

def test2():
    for i in xrange(AMOUNT):
        value = i % (256**2)
    return

def test3():
    for i in xrange(AMOUNT):
        value = i % (16**4)
    return

def test4():
    for i in xrange(AMOUNT):
        value = i % (4**8)
    return

def test5():
    for i in xrange(AMOUNT):
        value = i % (2**16)
    return

def run_tests():
    test1()
    test2()
    test3()
    test4()
    test5()
    return

if __name__ == '__main__':
    cProfile.run('run_tests()', 'results')
    stats = pstats.Stats('results')
    stats.sort_stats('calls', 'nfl')
    stats.print_stats()

...which produced the following output:

Fri May 11 15:11:59 2012    results

         8 function calls in 40.473 seconds

   Ordered by: call count, name/file/line

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000   40.473   40.473 <string>:1(<module>)
        1    0.000    0.000   40.473   40.473 test.py:31(run_tests)
        1   10.466   10.466   10.466   10.466 test.py:6(test1)
        1    7.475    7.475    7.475    7.475 test.py:11(test2)
        1    7.485    7.485    7.485    7.485 test.py:16(test3)
        1    7.539    7.539    7.539    7.539 test.py:21(test4)
        1    7.508    7.508    7.508    7.508 test.py:26(test5)

Using 65536 was the slowest at 10.466 seconds, while doing 256**2 was the fastest at 7.475 seconds (with the other possible exponent values falling in between). Granted, this difference in speed is only noticeable given high amounts of repetition, but I'm still curious as to why this occurs.

Why is taking the mod of a number by 65536 slower then taking the mod using exponents? They should evaluate to the same number, and I would have thought that it would take longer for the python interpreter to fully evaluate exponents before taking the mod.

By extension, is it generally more efficient to use powers of two in python expressions rather then fully typing the number out? And does this pattern hold true for operations besides modulus or for other numbers besides 2?

(btw, I'm using Python 2.7.2 (32 bit), and I ran the above on a 64 bit Windows 7 laptop).

EDIT:
So I tried reversing the order of the functions I call, and now the opposite is true. It looks like whatever the first function is in run_tests will always run a bit slower when using cProfile, which is weird. So, lesson learned, I guess -- profilers are weird :D

875

asked May 11 '12 22:05

Michael0x2a

1 Answers

There is no difference in the generated bytecode, because the compiler does its job well and optimizes away the constant arithmetic expression. That means your test results are just a coincidence (try timing the functions in a different order!).

>>> import dis
>>> dis.dis(test1)
  2           0 SETUP_LOOP              30 (to 33)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_GLOBAL              1 (AMOUNT)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                16 (to 32)
             16 STORE_FAST               0 (i)

  3          19 LOAD_FAST                0 (i)
             22 LOAD_CONST               1 (65536)
             25 BINARY_MODULO       
             26 STORE_FAST               1 (value)
             29 JUMP_ABSOLUTE           13
        >>   32 POP_BLOCK           

  4     >>   33 LOAD_CONST               0 (None)
             36 RETURN_VALUE        
>>> dis.dis(test5)
  2           0 SETUP_LOOP              30 (to 33)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_GLOBAL              1 (AMOUNT)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                16 (to 32)
             16 STORE_FAST               0 (i)

  3          19 LOAD_FAST                0 (i)
             22 LOAD_CONST               3 (65536)
             25 BINARY_MODULO       
             26 STORE_FAST               1 (value)
             29 JUMP_ABSOLUTE           13
        >>   32 POP_BLOCK           

  4     >>   33 LOAD_CONST               0 (None)
             36 RETURN_VALUE

(well actually there is a difference: The number is stored at different offsets in the constant table. I can't imagine this causing any difference, though).

For completeness, here's a proper test that uses the timeit module:

import timeit

setup = "i = 1337"

best1 = best2 = float("inf")
for _ in range(5000):
  best1 = min(best1, timeit.timeit("i % 65536", setup=setup, number=10000))
for _ in range(5000):
  best2 = min(best2, timeit.timeit("i % (2**16)", setup=setup, number=10000))
print best1
print best2

Note that I am measuring the minimum time needed, rather than the average. If it takes longer for some reason, this just means that it was interrupted more often (because the code doesn't depend on anything but the power of your CPU).

135

answered Sep 22 '22 16:09

Niklas B.

Related questions
                            
                                How to reverse a dictionary that has repeated values
                            
                                2 techniques for including files in a Python distribution: which is better?
                            
                                Accessing static properties in Python
                            
                                Populating a SQLite3 database from a .txt file with Python
                            
                                Ruby equivalent to Python __main__ [duplicate]
                            
                                Generate XML file from model data
                            
                                Python pretty print an XML given an XML string
                            
                                Any Python password-generators that are readable and pronounceable?
                            
                                how to insert some text in all django context using django middleware
                            
                                Tuple Unpacking Similar to Python, but in Common Lisp
                            
                                Python Wrap Class Method
                            
                                ValueError: invalid \x escape
                            
                                How to assign a string value to an array in numpy?
                            
                                Neural Network based ranking of documents
                            
                                how to make python script self-executable [duplicate]
                            
                                return actual tweets in tweepy?
                            
                                How to prevent auto escape in Django templates?
                            
                                To send threekeys using send_keys() in selenium python webdriver
                            
                                Parsing configure file with same section name in python
                            
                                How to run a Python script portably without specifying its full path

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is taking the mod of a number in python faster with exponents?

Tags:

python

optimization

profiling

Michael0x2a

People also ask

1 Answers

Niklas B.

Recent Activity

Donate For Us