Why do argument-less function calls execute faster?

Tags:

I set up a simple custom function that takes some default arguments (Python 3.5):

def foo(a=10, b=20, c=30, d=40):
    return a * b + c * d

and timed different calls to it with or without specifying argument values:

Without specifying arguments:

%timeit foo()
The slowest run took 7.83 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 361 ns per loop

Specifying arguments:

%timeit foo(a=10, b=20, c=30, d=40)
The slowest run took 12.83 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 446 ns per loop

As you can see, there is a somewhat noticeable increase in time required for the call specifying arguments and for the one not specifying them. In simple one-off calls this might be negligible, but the overhead scales and becomes more noticeable if a large number of calls to a function are made:

No arguments:

%timeit for i in range(10000): foo()
100 loops, best of 3: 3.83 ms per loop

With Arguments:

%timeit for i in range(10000): foo(a=10, b=20, c=30, d=40)
100 loops, best of 3: 4.68 ms per loop

The same behaviour is present and in Python 2.7 where the time difference between these calls was actually a bit larger foo() -> 291ns and foo(a=10, b=20, c=30, d=40) -> 410ns

Why does this happen? Should I generally try and avoid specifying argument values during calls?

487

asked Jan 04 '16 17:01

Dimitris Fasarakis Hilliard

1 Answers

Why does this happen? Should I avoid specifying argument values during calls?

Generally, No. The real reason you're able to see this is because the function you are using is simply not computationally intensive. As such, the time required for the additional byte code commands issued in the case where arguments are supplied can be detected through timing.

If, for example, you had a more intensive function of the form:

def foo_intensive(a=10, b=20, c=30, d=40): 
    [i * j for i in range(a * b) for j in range(c * d)]

It will pretty much show no difference whatsoever in time required:

%timeit foo_intensive()
10 loops, best of 3: 32.7 ms per loop

%timeit foo_intensive(a=10, b=20, c=30, d=40)
10 loops, best of 3: 32.7 ms per loop

Even when scaled to more calls, the time required to execute the function body simply trumps the small overhead introduced by additional byte code instructions.

Looking at the Byte Code:

One way of viewing the generated byte code issued for each call case is by creating a function that wraps around foo and calls it in different ways. For now, let's create fooDefault for calls using default arguments and fooKwargs() for functions specifying keyword arguments:

# call foo without arguments, using defaults
def fooDefault():
    foo()

# call foo with keyword arguments
def fooKw():
    foo(a=10, b=20, c=30, d=40)

Now with dis we can see the differences in byte code between these. For the default version, we can see that essentially one command is issued (Ignoring POP_TOP which is present in both cases) for the function call, CALL_FUNCTION:

dis.dis(fooDefaults)
  2           0 LOAD_GLOBAL              0 (foo)
              3 CALL_FUNCTION            0 (0 positional, 0 keyword pair)  
              6 POP_TOP
              7 LOAD_CONST               0 (None)
             10 RETURN_VALUE

On the other hand, in the case where keywords are used, 8 more LOAD_CONST commands are issued in order to load the argument names (a, b, c, d) and values (10, 20, 30, 40) into the value stack (even though loading numbers < 256 is probably really fast in this case since they are cached):

dis.dis(fooKwargs)
  2           0 LOAD_GLOBAL              0 (foo)
              3 LOAD_CONST               1 ('a')    # call starts
              6 LOAD_CONST               2 (10)
              9 LOAD_CONST               3 ('b')
             12 LOAD_CONST               4 (20)
             15 LOAD_CONST               5 ('c')
             18 LOAD_CONST               6 (30)
             21 LOAD_CONST               7 ('d')
             24 LOAD_CONST               8 (40)
             27 CALL_FUNCTION         1024 (0 positional, 4 keyword pair)
             30 POP_TOP                             # call ends
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE

Additionally, a few extra steps are generally required for the case where keyword arguments are not zero. (for example in ceval/_PyEval_EvalCodeWithName()).

Even though these are really fast commands, they do sum up. The more arguments the bigger the sum and, when many calls to the function are actually performed these pile up to result in a felt difference in execution time.

A direct result of these is that the more values we specify, the more commands must be issued and the function runs slower. Additionally, specifying positional arguments, unpacking positional arguments and unpacking keyword arguments all have a different amount of overhead associated with them:

Positional arguments foo(10, 20, 30, 40): Require 4 additional commands to load each value.
List unpacking foo(*[10, 20, 30, 40]): 4 LOAD_CONST commands and an additional BUILD_LIST command.
- Using a list as in foo(*l) cuts down execution a bit since we provide an already built list containing the values.
Dictionary unpacking foo(**{'a':10, 'b':20, 'c': 30, 'd': 40}): 8 LOAD_CONST commands and a BUILD_MAP.
- As with list unpacking foo(**d) will cut down execution because a built list will bu supplied.

All in all the ordering for the execution times of different cases of calls are:

defaults < positionals < keyword arguments < list unpacking < dictionary unpacking

I suggest using dis.dis on these cases and seeing their differences.

In conclusion:

As @goofd pointed out in a comment, this is really something one should not worry about, it really does depend on the use case. If you frequently call 'light' functions from a computation standpoint, specifying defaults will produce a slight boost in speed. If you frequently supply different values this produces next to nothing.

So, it's probably negligible and trying to get boosts from obscure edge cases like this is really pushing it. If you find yourself doing this, you might want to look at things like PyPy and Cython.

106

answered Sep 26 '22 08:09

Dimitris Fasarakis Hilliard

Related questions
                            
                                Flask-SQLAlchemy check if database server is responsive
                            
                                How to throw exception if script is run with Python 2?
                            
                                Pandas difference in index with date values
                            
                                Count number of rows when row contains certain text
                            
                                odoo - display name of many2one field combination of 2 fields
                            
                                SQLAlchemy query shows error "Can't join table/selectable 'workflows' to itself"
                            
                                Python module reference with Sphinx documentation
                            
                                Using annotate or extra to add field of foreignkey to queryset ? (equivalent of SQL "AS" ?)
                            
                                Pandas Percentage count on a DataFrame groupby
                            
                                Django ImportError: No module named middleware
                            
                                Random sample of paired lists in Python
                            
                                using Flask, python and postgresql how can I connect to a pre-existing database?
                            
                                Why doesn't Python `NamedTemporaryFile` default `delete` to `False`?
                            
                                pyspark: grouby and then get max value of each group
                            
                                Sending email from Python using STARTTLS
                            
                                gensim LdaMulticore not multiprocessing?
                            
                                How to generate noise in frequency range with numpy?
                            
                                parallelizing tasks in Luigi Orchestrator
                            
                                Create dictionary from splitted strings from list of strings
                            
                                HTTP 2 request in python 2.7

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do argument-less function calls execute faster?

Tags:

python

function

python-3.x

python-internals

python-2.7

Dimitris Fasarakis Hilliard

People also ask

1 Answers

Looking at the Byte Code:

In conclusion:

Dimitris Fasarakis Hilliard

Recent Activity

Donate For Us