I have recently been working on creating a prime-number finding program. However, I came to notice that one function was much slower when it used arguments than when it used pre-set values. Across 3 different versions, it becomes clear that the variables are significantly slowing down the program, and I would like to know why. <h3> Version 1: 7.5 seconds</h3> Here was the original (somewhat simplified for this question) function: <pre class="prettyprint"><code>def version1(n, p): return ((n*n - 2) & ((1 << p) - 1)) + ((n*n - 2) >> p) </code></pre> When ran with the <code>timeit</code> module 100 times: <pre class="prettyprint"><code>timeit.timeit("version1(200, 500000000)", "from __main__ import version1", number=100) </code></pre> it takes <code>7.5</code> seconds. <h3> Version 2: 0.0001 seconds</h3> However, here is the second version, in which there are no parameters, and the numbers are directly placed in the return value. The equasion is exactly the same as Version 1: <pre class="prettyprint"><code>def version2(): return ((200*200 - 2) & ((1 << 500000000) - 1)) + ((200*200 - 2) >> 500000000) </code></pre> When ran with the <code>timeit</code> module 100 times: <pre class="prettyprint"><code>timeit.timeit("version2()", "from __main__ import version2", number=100 </code></pre> in this only takes <code>0.00001</code> seconds! <h3> Version 3: 6.3 seconds</h3> Lastly, for completeness, I tried a version that had no parameters, but still kept its values as variables: <pre class="prettyprint"><code>def version3(): n = 200 p = 500000000 return ((n*n - 2) & ((1 << p) - 1)) + ((n*n - 2) >> p) </code></pre> When ran with <code>timeit</code>: <pre class="prettyprint"><code>timeit.timeit("version3()", "from __main__ import version3", number = 100) </code></pre> it took <code>6.3</code> seconds, which is relatively close to Version 1. Why is it that the same function can take so much longer when there are variables involved, and how can I make Version 1 more effient?

Python pre-computes calculations when compiling as a so-called peep-hole optimisation: <pre class="prettyprint"><code>>>> import dis >>> def version2(): ... return ((200*200 - 2) & ((1 << 500000000) - 1)) + ((200*200 - 2) >> 500000000) ... >>> dis.dis(version2) 2 0 LOAD_CONST 13 (39998) 2 RETURN_VALUE </code></pre> <code>version2()</code> returns the already-calculated value, and does no actual work. Returning a constant is of course much, much faster than having to calculate the value each time. See the <code>fold_binops_on_constants</code> function in the <code>peephole.c</code> Python source file for the details on how the compiler does this. As a result, compiling <code>version2</code> takes (a lot) more time than <code>version1</code>: <pre class="prettyprint"><code>>>> import timeit >>> version1_text = '''\ ... def version1(n, p): ... return ((n*n - 2) & ((1 << p) - 1)) + ((n*n - 2) >> p) ... ''' >>> version2_text = '''\ ... def version2(): ... return ((200*200 - 2) & ((1 << 500000000) - 1)) + ((200*200 - 2) >> 500000000) ... ''' >>> timeit.timeit("compile(t, '', 'exec')", 'from __main__ import version1_text as t', number=10) 0.00028649598243646324 >>> timeit.timeit("compile(t, '', 'exec')", 'from __main__ import version2_text as t', number=10) 2.2103765579813626 </code></pre> Good thing Python caches the bytecode results of compilation! The intermediary results of each sub-expression is also stored in the <code>co_consts</code> attribute of the code object, and some of those are rather large: <pre class="prettyprint"><code>>>> import sys >>> consts = version2.__code__.co_consts >>> for obj in consts: ... size = sys.getsizeof(obj) ... print(f'{type(obj)!s:<18} {size:<8} {"<too large to print>" if size > 100 else obj}') ... <class 'NoneType'> 16 None <class 'int'> 28 200 <class 'int'> 28 2 <class 'int'> 28 1 <class 'int'> 28 500000000 <class 'int'> 28 40000 <class 'int'> 28 39998 <class 'int'> 66666692 <too large to print> <class 'int'> 66666692 <too large to print> <class 'int'> 28 39998 <class 'int'> 28 40000 <class 'int'> 28 39998 <class 'int'> 24 0 <class 'int'> 28 39998 </code></pre> so this did make the bytecode cache a little larger: <pre class="prettyprint"><code>>>> import marshal >>> len(marshal.dumps(version1.__code__)) 129 >>> len(marshal.dumps(version2.__code__)) 133333481 </code></pre> That's a minimum of 127MB for the <code>.pyc</code> file for the module that contains your non-argument version!

Why does using arguments make this function so much slower?

Version 1: 7.5 seconds

Here was the original (somewhat simplified for this question) function:

def version1(n, p):
  return ((n*n - 2) & ((1 << p) - 1)) + ((n*n - 2) >> p)

When ran with the timeit module 100 times:

timeit.timeit("version1(200, 500000000)", "from __main__ import version1", number=100)

it takes 7.5 seconds.

Version 2: 0.0001 seconds

However, here is the second version, in which there are no parameters, and the numbers are directly placed in the return value. The equasion is exactly the same as Version 1:

def version2():
  return ((200*200 - 2) & ((1 << 500000000) - 1)) + ((200*200 - 2) >> 500000000)

When ran with the timeit module 100 times:

timeit.timeit("version2()", "from __main__ import version2", number=100

in this only takes 0.00001 seconds!

Version 3: 6.3 seconds

Lastly, for completeness, I tried a version that had no parameters, but still kept its values as variables:

def version3():
  n = 200
  p = 500000000
  return ((n*n - 2) & ((1 << p) - 1)) + ((n*n - 2) >> p)

When ran with timeit:

timeit.timeit("version3()", "from __main__ import version3", number = 100)

it took 6.3 seconds, which is relatively close to Version 1.

Why is it that the same function can take so much longer when there are variables involved, and how can I make Version 1 more effient?

929

asked May 06 '17 18:05

Marco

1 Answers

Python pre-computes calculations when compiling as a so-called peep-hole optimisation:

>>> import dis
>>> def version2():
...   return ((200*200 - 2) & ((1 << 500000000) - 1)) + ((200*200 - 2) >> 500000000)
...
>>> dis.dis(version2)
  2           0 LOAD_CONST              13 (39998)
              2 RETURN_VALUE

version2() returns the already-calculated value, and does no actual work. Returning a constant is of course much, much faster than having to calculate the value each time.

See the fold_binops_on_constants function in the peephole.c Python source file for the details on how the compiler does this.

As a result, compiling version2 takes (a lot) more time than version1:

>>> import timeit
>>> version1_text = '''\
... def version1(n, p):
...   return ((n*n - 2) & ((1 << p) - 1)) + ((n*n - 2) >> p)
... '''
>>> version2_text = '''\
... def version2():
...   return ((200*200 - 2) & ((1 << 500000000) - 1)) + ((200*200 - 2) >> 500000000)
... '''
>>> timeit.timeit("compile(t, '', 'exec')", 'from __main__ import version1_text as t', number=10)
0.00028649598243646324
>>> timeit.timeit("compile(t, '', 'exec')", 'from __main__ import version2_text as t', number=10)
2.2103765579813626

Good thing Python caches the bytecode results of compilation!

The intermediary results of each sub-expression is also stored in the co_consts attribute of the code object, and some of those are rather large:

>>> import sys
>>> consts = version2.__code__.co_consts
>>> for obj in consts:
...     size = sys.getsizeof(obj)
...     print(f'{type(obj)!s:<18} {size:<8} {"<too large to print>" if size > 100 else obj}')
...
<class 'NoneType'> 16       None
<class 'int'>      28       200
<class 'int'>      28       2
<class 'int'>      28       1
<class 'int'>      28       500000000
<class 'int'>      28       40000
<class 'int'>      28       39998
<class 'int'>      66666692 <too large to print>
<class 'int'>      66666692 <too large to print>
<class 'int'>      28       39998
<class 'int'>      28       40000
<class 'int'>      28       39998
<class 'int'>      24       0
<class 'int'>      28       39998

so this did make the bytecode cache a little larger:

>>> import marshal
>>> len(marshal.dumps(version1.__code__))
129
>>> len(marshal.dumps(version2.__code__))
133333481

That's a minimum of 127MB for the .pyc file for the module that contains your non-argument version!

124

answered Oct 05 '22 05:10

Martijn Pieters

Related questions
                            
                                How to calculate p-value for two lists of floats?
                            
                                How to read numbers in python from csv file?
                            
                                Testing Flask routes do and don't exist
                            
                                Jupyter notebook run all cells on open
                            
                                Unzip buffer with Python?
                            
                                Getting parent of AST node in Python
                            
                                Elegant iteration over five dice
                            
                                pandas cross join no columns in common [duplicate]
                            
                                python numpy.savetxt header has extra character #
                            
                                Iterate through a dataframe by index
                            
                                PyCharm debugger fails with AttributeError
                            
                                Django - How to filter by date with Django Rest Framework?
                            
                                Display a pandas data frame with Bokeh
                            
                                convert pandas dataframe column from hex string to int
                            
                                Is a python dict comprehension always "last wins" if there are duplicate keys
                            
                                Insert result of sklearn CountVectorizer in a pandas dataframe
                            
                                unexpected type: <class 'pyspark.sql.types.DataTypeSingleton'> when casting to Int on a ApacheSpark Dataframe
                            
                                Does python `str()` function call `__str__()` function of a class?
                            
                                Run shell script from python with permissions
                            
                                Python scatter plot different colors depending on value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does using arguments make this function so much slower?

Tags:

performance

python

algorithm

math