I have recently been working on creating a prime-number finding program. However, I came to notice that one function was much slower when it used arguments than when it used pre-set values.
Across 3 different versions, it becomes clear that the variables are significantly slowing down the program, and I would like to know why.
Here was the original (somewhat simplified for this question) function:
def version1(n, p):
return ((n*n - 2) & ((1 << p) - 1)) + ((n*n - 2) >> p)
When ran with the timeit
module 100 times:
timeit.timeit("version1(200, 500000000)", "from __main__ import version1", number=100)
it takes 7.5
seconds.
However, here is the second version, in which there are no parameters, and the numbers are directly placed in the return value. The equasion is exactly the same as Version 1:
def version2():
return ((200*200 - 2) & ((1 << 500000000) - 1)) + ((200*200 - 2) >> 500000000)
When ran with the timeit
module 100 times:
timeit.timeit("version2()", "from __main__ import version2", number=100
in this only takes 0.00001
seconds!
Lastly, for completeness, I tried a version that had no parameters, but still kept its values as variables:
def version3():
n = 200
p = 500000000
return ((n*n - 2) & ((1 << p) - 1)) + ((n*n - 2) >> p)
When ran with timeit
:
timeit.timeit("version3()", "from __main__ import version3", number = 100)
it took 6.3
seconds, which is relatively close to Version 1.
Why is it that the same function can take so much longer when there are variables involved, and how can I make Version 1 more effient?
Also from what I gather, the reason function calls are so slow is because python needs to look up to make sure the function still exists before it can run it or something? Isn't there any way to just tell it to like...
Yes method calls slow down the code execution a tiny little bit, if they a not inlined by the c#-compiler or the jit-compiler. However, unless your code runs in a loop and is executed a million times or so, you should really focus on producing clean, understandable and maintainable code.
Passing many arguments will have no effect to the function, as long as the parameters are filled in.
An argument is a way for you to provide more information to a function. The function can then use that information as it runs, like a variable. Said differently, when you create a function, you can pass in data in the form of an argument, also called a parameter.
Python pre-computes calculations when compiling as a so-called peep-hole optimisation:
>>> import dis
>>> def version2():
... return ((200*200 - 2) & ((1 << 500000000) - 1)) + ((200*200 - 2) >> 500000000)
...
>>> dis.dis(version2)
2 0 LOAD_CONST 13 (39998)
2 RETURN_VALUE
version2()
returns the already-calculated value, and does no actual work. Returning a constant is of course much, much faster than having to calculate the value each time.
See the fold_binops_on_constants
function in the peephole.c
Python source file for the details on how the compiler does this.
As a result, compiling version2
takes (a lot) more time than version1
:
>>> import timeit
>>> version1_text = '''\
... def version1(n, p):
... return ((n*n - 2) & ((1 << p) - 1)) + ((n*n - 2) >> p)
... '''
>>> version2_text = '''\
... def version2():
... return ((200*200 - 2) & ((1 << 500000000) - 1)) + ((200*200 - 2) >> 500000000)
... '''
>>> timeit.timeit("compile(t, '', 'exec')", 'from __main__ import version1_text as t', number=10)
0.00028649598243646324
>>> timeit.timeit("compile(t, '', 'exec')", 'from __main__ import version2_text as t', number=10)
2.2103765579813626
Good thing Python caches the bytecode results of compilation!
The intermediary results of each sub-expression is also stored in the co_consts
attribute of the code object, and some of those are rather large:
>>> import sys
>>> consts = version2.__code__.co_consts
>>> for obj in consts:
... size = sys.getsizeof(obj)
... print(f'{type(obj)!s:<18} {size:<8} {"<too large to print>" if size > 100 else obj}')
...
<class 'NoneType'> 16 None
<class 'int'> 28 200
<class 'int'> 28 2
<class 'int'> 28 1
<class 'int'> 28 500000000
<class 'int'> 28 40000
<class 'int'> 28 39998
<class 'int'> 66666692 <too large to print>
<class 'int'> 66666692 <too large to print>
<class 'int'> 28 39998
<class 'int'> 28 40000
<class 'int'> 28 39998
<class 'int'> 24 0
<class 'int'> 28 39998
so this did make the bytecode cache a little larger:
>>> import marshal
>>> len(marshal.dumps(version1.__code__))
129
>>> len(marshal.dumps(version2.__code__))
133333481
That's a minimum of 127MB for the .pyc
file for the module that contains your non-argument version!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With