What exactly is the optimization `functools.partial` is making?

Tags:

CPython 3.6.4:

from functools import partial

def add(x, y, z, a):
    return x + y + z + a

list_of_as = list(range(10000))

def max1():
    return max(list_of_as , key=lambda a: add(10, 20, 30, a))

def max2():
    return max(list_of_as , key=partial(add, 10, 20, 30))

now:

In [2]: %timeit max1()
4.36 ms ± 42.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [3]: %timeit max2()
3.67 ms ± 25.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I thought partial just remembers part of parameters and then forwards them to the original function when called with the rest of the parameters (so it's nothing more than a shortcut), but it seems it makes some optimization. In my case the whole max2 function gets optimized by 15% compared to the max1, which is pretty nice.

It would be great to know what the optimization is, so I could use it in a more efficient way. Docs are silent regarding any optimization. Not surprisingly, "roughly equivalent to" implementation (given in docs), does not optimize at all:

In [3]: %timeit max2()  # using `partial` implementation from docs 
10.7 ms ± 267 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

373

asked Apr 22 '18 12:04

pawelswiecki

1 Answers

The following arguments actually apply only to CPython, for other Python implementations it could be completely different. You actually said your question is about CPython but nevertheless I think it's important to realize that these in-depth questions almost always depend on implementation details that might be different for different implementations and might even be different between different CPython versions (for example CPython 2.7 could be completely different, but so could be CPython 3.5)!

Timings

First of all, I can't reproduce differences of 15% or even 20%. On my computer the difference is around ~10%. It's even less when you change the lambda so it doesn't have to look up add from the global scope (as already pointed out in the comments you can pass the add function as default argument to the function so the lookup happens in the local scope).

from functools import partial

def add(x, y, z, a):
    return x + y + z + a

def max_lambda_default(lst):
    return max(lst , key=lambda a, add=add: add(10, 20, 30, a))

def max_lambda(lst):
    return max(lst , key=lambda a: add(10, 20, 30, a))

def max_partial(lst):
    return max(lst , key=partial(add, 10, 20, 30))

I actually benchmarked these:

enter image description here

from simple_benchmark import benchmark
from collections import OrderedDict

arguments = OrderedDict((2**i, list(range(2**i))) for i in range(1, 20))
b = benchmark([max_lambda_default, max_lambda, max_partial], arguments, "list size")

%matplotlib notebook
b.plot_difference_percentage(relative_to=max_partial)

Possible explanations

It's very hard to find the exact reason for the difference. However there are a few possible options, assuming you have a CPython version with compiled _functools module (all desktop versions of CPython that I use have it).

As you already found out the Python version of partial will be significantly slower.

partial is implemented in C and can call the function directly - without intermediate Python layer¹. The lambda on the other hand needs to do a Python level call to the "captured" function.
partial actually knows how the arguments fit together. So it can create the arguments that are passed to the function more efficiently (it just concatenats the stored argument tuple to the passed in argument tuple) instead of building a completely new argument tuple.
In more recent Python versions several internals were changed in an effort to optimize function calls (the so called FASTCALL optimization). Victor Stinner has a list of related pull requests on his blog in case you want to find out more about it.

That probably will affect both the lambda and the partial but again because partial is a C function it knows which one to call directly without having to infer it like lambda does.

However it's very important to realize that creating the partial has some overhead. The break-even point is for ~10 list elements, if the list is shorter, then the lambda will be faster.

Footnotes

¹ If you call a function from Python it uses the OP-code CALL_FUNCTION which is actually a wrapper (that's what I meant with Python layer) around the PyObject_Call* (or FASTCAL) functions. But it also includes creating the argument tuple/dictionary. If you call a function from a C function you can avoid this thin wrapper by directly calling the PyObject_Call* functions.

In case you're interested about the OP-Codes, you can disassemble the function:

import dis
    
dis.dis(max_lambda_default)

 0 LOAD_GLOBAL              0 (max)
 2 LOAD_FAST                0 (lst)
 4 LOAD_GLOBAL              1 (add)
 6 BUILD_TUPLE              1
 8 LOAD_CONST               1 (<code object <lambda>>)
10 LOAD_CONST               2 ('max_lambda_default.<locals>.<lambda>')
12 MAKE_FUNCTION            1 (defaults)
14 LOAD_CONST               3 (('key',))
16 CALL_FUNCTION_KW         2
18 RETURN_VALUE

Disassembly of <code object <lambda>>:
 0 LOAD_FAST                1 (add)      <--- (2)
 2 LOAD_CONST               1 (10)
 4 LOAD_CONST               2 (20)
 6 LOAD_CONST               3 (30)
 8 LOAD_FAST                0 (a)
10 CALL_FUNCTION            4            <--- (1)
12 RETURN_VALUE

As you can see the CALL_FUNCTION op code (1) is actually in there.

As an aside: The LOAD_FAST (2) is responsible for the performance difference between the lambda_default and the lambda without default (which has to resort to a slower lookup). That's because loading a name actually starts by checking the local scope (the function scope), in the case of add=add the add function is in the local scope, so it can make a faster lookup. If you don't have it in the local scope it will check each surrounding scope until it finds the name and it only stops when it reaches the global scope. And that lookup is done every time the lambda is called!

192

answered Oct 12 '22 14:10

MSeifert

Related questions
                            
                                How to save and load MLLib model in Apache Spark?
                            
                                Add metadata comment to Numpy ndarray
                            
                                How to use technical indicators of TA-Lib with pandas in python
                            
                                How to send a colored text message?
                            
                                Jupyter: Write a custom magic that modifies the contents of the cell it's in
                            
                                zip_longest without fillvalue
                            
                                How to optimize multiprocessing in Python
                            
                                How to split a list into n groups in all possible combinations of group length and elements within group?
                            
                                Spyder 3 "Set Console Working Directory" not working
                            
                                How do I feed Tensorflow placeholders with numpy arrays?
                            
                                What should I put in the body of an abstract method?
                            
                                What's the difference between dummy variable and one-hot encoding?
                            
                                TypeError: init() missing 1 required positional argument: 'message' using Multiprocessing
                            
                                Pipe PIL images to ffmpeg stdin - Python
                            
                                Python Requests - ChunkedEncodingError(e) - requests.iter_lines
                            
                                pip-selfcheck.json with virtualenv
                            
                                How to generate n-level hierarchical JSON from pandas DataFrame?
                            
                                opencv - cropping handwritten lines (line segmentation)
                            
                                Add top level argparse arguments after subparser args
                            
                                Sklearn: adding lemmatizer to CountVectorizer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What exactly is the optimization `functools.partial` is making?

Tags:

performance

python

python-internals

cpython

functools