CPython 3.6.4:
from functools import partial
def add(x, y, z, a):
return x + y + z + a
list_of_as = list(range(10000))
def max1():
return max(list_of_as , key=lambda a: add(10, 20, 30, a))
def max2():
return max(list_of_as , key=partial(add, 10, 20, 30))
now:
In [2]: %timeit max1()
4.36 ms ± 42.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [3]: %timeit max2()
3.67 ms ± 25.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
I thought partial
just remembers part of parameters and then forwards them to the original function when called with the rest of the parameters (so it's nothing more than a shortcut), but it seems it makes some optimization. In my case the whole max2
function gets optimized by 15% compared to the max1
, which is pretty nice.
It would be great to know what the optimization is, so I could use it in a more efficient way. Docs are silent regarding any optimization. Not surprisingly, "roughly equivalent to" implementation (given in docs), does not optimize at all:
In [3]: %timeit max2() # using `partial` implementation from docs
10.7 ms ± 267 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
You can create partial functions in python by using the partial function from the functools library. Partial functions allow one to derive a function with x parameters to a function with fewer parameters and fixed values set for the more limited function.
The functools module is for higher-order functions: functions that act on or return other functions. In general, any callable object can be treated as a function for the purposes of this module. Simple lightweight unbounded function cache.
Partial functions allow us to fix a certain number of arguments of a function and generate a new function.
To use singledispatch , simply: add the decorator @singledispatch to the function process_data. add the decorator @process_data. register to process_dict and process_list .
The following arguments actually apply only to CPython, for other Python implementations it could be completely different. You actually said your question is about CPython but nevertheless I think it's important to realize that these in-depth questions almost always depend on implementation details that might be different for different implementations and might even be different between different CPython versions (for example CPython 2.7 could be completely different, but so could be CPython 3.5)!
First of all, I can't reproduce differences of 15% or even 20%. On my computer the difference is around ~10%. It's even less when you change the lambda
so it doesn't have to look up add
from the global scope (as already pointed out in the comments you can pass the add
function as default argument to the function so the lookup happens in the local scope).
from functools import partial
def add(x, y, z, a):
return x + y + z + a
def max_lambda_default(lst):
return max(lst , key=lambda a, add=add: add(10, 20, 30, a))
def max_lambda(lst):
return max(lst , key=lambda a: add(10, 20, 30, a))
def max_partial(lst):
return max(lst , key=partial(add, 10, 20, 30))
I actually benchmarked these:
from simple_benchmark import benchmark
from collections import OrderedDict
arguments = OrderedDict((2**i, list(range(2**i))) for i in range(1, 20))
b = benchmark([max_lambda_default, max_lambda, max_partial], arguments, "list size")
%matplotlib notebook
b.plot_difference_percentage(relative_to=max_partial)
It's very hard to find the exact reason for the difference. However there are a few possible options, assuming you have a CPython version with compiled _functools
module (all desktop versions of CPython that I use have it).
As you already found out the Python version of partial
will be significantly slower.
partial
is implemented in C and can call the function directly - without intermediate Python layer1. The lambda
on the other hand needs to do a Python level call to the "captured" function.
partial
actually knows how the arguments fit together. So it can create the arguments that are passed to the function more efficiently (it just concatenats the stored argument tuple to the passed in argument tuple) instead of building a completely new argument tuple.
In more recent Python versions several internals were changed in an effort to optimize function calls (the so called FASTCALL optimization). Victor Stinner has a list of related pull requests on his blog in case you want to find out more about it.
That probably will affect both the lambda
and the partial
but again because partial
is a C function it knows which one to call directly without having to infer it like lambda
does.
However it's very important to realize that creating the partial
has some overhead. The break-even point is for ~10 list elements, if the list is shorter, then the lambda
will be faster.
1 If you call a function from Python it uses the OP-code CALL_FUNCTION
which is actually a wrapper (that's what I meant with Python layer) around the PyObject_Call*
(or FASTCAL) functions. But it also includes creating the argument tuple/dictionary. If you call a function from a C function you can avoid this thin wrapper by directly calling the PyObject_Call*
functions.
In case you're interested about the OP-Codes, you can dis
assemble the function:
import dis
dis.dis(max_lambda_default)
0 LOAD_GLOBAL 0 (max)
2 LOAD_FAST 0 (lst)
4 LOAD_GLOBAL 1 (add)
6 BUILD_TUPLE 1
8 LOAD_CONST 1 (<code object <lambda>>)
10 LOAD_CONST 2 ('max_lambda_default.<locals>.<lambda>')
12 MAKE_FUNCTION 1 (defaults)
14 LOAD_CONST 3 (('key',))
16 CALL_FUNCTION_KW 2
18 RETURN_VALUE
Disassembly of <code object <lambda>>:
0 LOAD_FAST 1 (add) <--- (2)
2 LOAD_CONST 1 (10)
4 LOAD_CONST 2 (20)
6 LOAD_CONST 3 (30)
8 LOAD_FAST 0 (a)
10 CALL_FUNCTION 4 <--- (1)
12 RETURN_VALUE
As you can see the CALL_FUNCTION
op code (1) is actually in there.
As an aside: The LOAD_FAST
(2) is responsible for the performance difference between the lambda_default
and the lambda
without default (which has to resort to a slower lookup). That's because loading a name actually starts by checking the local scope (the function scope), in the case of add=add
the add function is in the local scope, so it can make a faster lookup. If you don't have it in the local scope it will check each surrounding scope until it finds the name and it only stops when it reaches the global scope. And that lookup is done every time the lambda
is called!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With