I set up a simple custom function that takes some default arguments (Python 3.5):
def foo(a=10, b=20, c=30, d=40):
return a * b + c * d
and timed different calls to it with or without specifying argument values:
Without specifying arguments:
%timeit foo()
The slowest run took 7.83 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 361 ns per loop
Specifying arguments:
%timeit foo(a=10, b=20, c=30, d=40)
The slowest run took 12.83 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 446 ns per loop
As you can see, there is a somewhat noticeable increase in time required for the call specifying arguments and for the one not specifying them. In simple one-off calls this might be negligible, but the overhead scales and becomes more noticeable if a large number of calls to a function are made:
No arguments:
%timeit for i in range(10000): foo()
100 loops, best of 3: 3.83 ms per loop
With Arguments:
%timeit for i in range(10000): foo(a=10, b=20, c=30, d=40)
100 loops, best of 3: 4.68 ms per loop
The same behaviour is present and in Python 2.7 where the time difference between these calls was actually a bit larger foo() -> 291ns
and foo(a=10, b=20, c=30, d=40) -> 410ns
Why does this happen? Should I generally try and avoid specifying argument values during calls?
I prefer the latter, because you only call the function write the function once, but it takes up more lines of code and will grow rapidly for functions with large numbers of arguments. Use whichever one you think is more readable and maintainable. Depending on the language, variables and function calls involved, either one might be preferable.
Let’s go through some of the techniques to avoid function calls. Inlining is a techniques used by the compilers to avoid function calls and save some time. Simply put, to inline a function means to put the body of the called function in the place where the call is made.
In a nutshell: function calls may or may not impact performance. The only way to tell is to profile your code. Don't try to guess where the slow code spots are, because the compiler and hardware have some incredible tricks up their sleeves. Profile the code to get the location of the slow spots.
Why is My Query Faster the Second Time it Runs? (Dear SQL DBA Episode 23) - littlekendra.com Why is My Query Faster the Second Time it Runs? (Dear SQL DBA Episode 23) It takes CPU time to figure out how to run a query. SQL Server uses memory to cache execution plans to save time the next time you run the query.
Why does this happen? Should I avoid specifying argument values during calls?
Generally, No. The real reason you're able to see this is because the function you are using is simply not computationally intensive. As such, the time required for the additional byte code commands issued in the case where arguments are supplied can be detected through timing.
If, for example, you had a more intensive function of the form:
def foo_intensive(a=10, b=20, c=30, d=40):
[i * j for i in range(a * b) for j in range(c * d)]
It will pretty much show no difference whatsoever in time required:
%timeit foo_intensive()
10 loops, best of 3: 32.7 ms per loop
%timeit foo_intensive(a=10, b=20, c=30, d=40)
10 loops, best of 3: 32.7 ms per loop
Even when scaled to more calls, the time required to execute the function body simply trumps the small overhead introduced by additional byte code instructions.
One way of viewing the generated byte code issued for each call case is by creating a function that wraps around foo
and calls it in different ways. For now, let's create fooDefault
for calls using default arguments and fooKwargs()
for functions specifying keyword arguments:
# call foo without arguments, using defaults
def fooDefault():
foo()
# call foo with keyword arguments
def fooKw():
foo(a=10, b=20, c=30, d=40)
Now with dis
we can see the differences in byte code between these. For the default version, we can see that essentially one command is issued (Ignoring POP_TOP
which is present in both cases) for the function call, CALL_FUNCTION
:
dis.dis(fooDefaults)
2 0 LOAD_GLOBAL 0 (foo)
3 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
6 POP_TOP
7 LOAD_CONST 0 (None)
10 RETURN_VALUE
On the other hand, in the case where keywords are used, 8 more LOAD_CONST
commands are issued in order to load the argument names (a, b, c, d)
and values (10, 20, 30, 40)
into the value stack (even though loading numbers < 256
is probably really fast in this case since they are cached):
dis.dis(fooKwargs)
2 0 LOAD_GLOBAL 0 (foo)
3 LOAD_CONST 1 ('a') # call starts
6 LOAD_CONST 2 (10)
9 LOAD_CONST 3 ('b')
12 LOAD_CONST 4 (20)
15 LOAD_CONST 5 ('c')
18 LOAD_CONST 6 (30)
21 LOAD_CONST 7 ('d')
24 LOAD_CONST 8 (40)
27 CALL_FUNCTION 1024 (0 positional, 4 keyword pair)
30 POP_TOP # call ends
31 LOAD_CONST 0 (None)
34 RETURN_VALUE
Additionally, a few extra steps are generally required for the case where keyword arguments are not zero. (for example in ceval/_PyEval_EvalCodeWithName()
).
Even though these are really fast commands, they do sum up. The more arguments the bigger the sum and, when many calls to the function are actually performed these pile up to result in a felt difference in execution time.
A direct result of these is that the more values we specify, the more commands must be issued and the function runs slower. Additionally, specifying positional arguments, unpacking positional arguments and unpacking keyword arguments all have a different amount of overhead associated with them:
foo(10, 20, 30, 40)
: Require 4 additional commands to load each value.foo(*[10, 20, 30, 40])
: 4 LOAD_CONST
commands and an additional BUILD_LIST
command.
foo(*l)
cuts down execution a bit since we provide an already built list containing the values.foo(**{'a':10, 'b':20, 'c': 30, 'd': 40})
: 8 LOAD_CONST
commands and a BUILD_MAP
.
foo(**d)
will cut down execution because a built list will bu supplied.All in all the ordering for the execution times of different cases of calls are:
defaults < positionals < keyword arguments < list unpacking < dictionary unpacking
I suggest using dis.dis
on these cases and seeing their differences.
As @goofd pointed out in a comment, this is really something one should not worry about, it really does depend on the use case. If you frequently call 'light' functions from a computation standpoint, specifying defaults will produce a slight boost in speed. If you frequently supply different values this produces next to nothing.
So, it's probably negligible and trying to get boosts from obscure edge cases like this is really pushing it. If you find yourself doing this, you might want to look at things like PyPy
and Cython
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With