Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What exactly is the optimization `functools.partial` is making?

CPython 3.6.4:

from functools import partial

def add(x, y, z, a):
    return x + y + z + a

list_of_as = list(range(10000))

def max1():
    return max(list_of_as , key=lambda a: add(10, 20, 30, a))

def max2():
    return max(list_of_as , key=partial(add, 10, 20, 30))

now:

In [2]: %timeit max1()
4.36 ms ± 42.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [3]: %timeit max2()
3.67 ms ± 25.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I thought partial just remembers part of parameters and then forwards them to the original function when called with the rest of the parameters (so it's nothing more than a shortcut), but it seems it makes some optimization. In my case the whole max2 function gets optimized by 15% compared to the max1, which is pretty nice.

It would be great to know what the optimization is, so I could use it in a more efficient way. Docs are silent regarding any optimization. Not surprisingly, "roughly equivalent to" implementation (given in docs), does not optimize at all:

In [3]: %timeit max2()  # using `partial` implementation from docs 
10.7 ms ± 267 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
like image 373
pawelswiecki Avatar asked Apr 22 '18 12:04

pawelswiecki


People also ask

What does partial from Functools do?

You can create partial functions in python by using the partial function from the functools library. Partial functions allow one to derive a function with x parameters to a function with fewer parameters and fixed values set for the more limited function.

What does Functools do?

The functools module is for higher-order functions: functions that act on or return other functions. In general, any callable object can be treated as a function for the purposes of this module. Simple lightweight unbounded function cache.

What does partial do in tkinter?

Partial functions allow us to fix a certain number of arguments of a function and generate a new function.

How do you use Singledispatch?

To use singledispatch , simply: add the decorator @singledispatch to the function process_data. add the decorator @process_data. register to process_dict and process_list .


1 Answers

The following arguments actually apply only to CPython, for other Python implementations it could be completely different. You actually said your question is about CPython but nevertheless I think it's important to realize that these in-depth questions almost always depend on implementation details that might be different for different implementations and might even be different between different CPython versions (for example CPython 2.7 could be completely different, but so could be CPython 3.5)!

Timings

First of all, I can't reproduce differences of 15% or even 20%. On my computer the difference is around ~10%. It's even less when you change the lambda so it doesn't have to look up add from the global scope (as already pointed out in the comments you can pass the add function as default argument to the function so the lookup happens in the local scope).

from functools import partial

def add(x, y, z, a):
    return x + y + z + a

def max_lambda_default(lst):
    return max(lst , key=lambda a, add=add: add(10, 20, 30, a))

def max_lambda(lst):
    return max(lst , key=lambda a: add(10, 20, 30, a))

def max_partial(lst):
    return max(lst , key=partial(add, 10, 20, 30))

I actually benchmarked these:

enter image description here

from simple_benchmark import benchmark
from collections import OrderedDict

arguments = OrderedDict((2**i, list(range(2**i))) for i in range(1, 20))
b = benchmark([max_lambda_default, max_lambda, max_partial], arguments, "list size")

%matplotlib notebook
b.plot_difference_percentage(relative_to=max_partial)

Possible explanations

It's very hard to find the exact reason for the difference. However there are a few possible options, assuming you have a CPython version with compiled _functools module (all desktop versions of CPython that I use have it).

As you already found out the Python version of partial will be significantly slower.

  • partial is implemented in C and can call the function directly - without intermediate Python layer1. The lambda on the other hand needs to do a Python level call to the "captured" function.

  • partial actually knows how the arguments fit together. So it can create the arguments that are passed to the function more efficiently (it just concatenats the stored argument tuple to the passed in argument tuple) instead of building a completely new argument tuple.

  • In more recent Python versions several internals were changed in an effort to optimize function calls (the so called FASTCALL optimization). Victor Stinner has a list of related pull requests on his blog in case you want to find out more about it.

    That probably will affect both the lambda and the partial but again because partial is a C function it knows which one to call directly without having to infer it like lambda does.

However it's very important to realize that creating the partial has some overhead. The break-even point is for ~10 list elements, if the list is shorter, then the lambda will be faster.

Footnotes

1 If you call a function from Python it uses the OP-code CALL_FUNCTION which is actually a wrapper (that's what I meant with Python layer) around the PyObject_Call* (or FASTCAL) functions. But it also includes creating the argument tuple/dictionary. If you call a function from a C function you can avoid this thin wrapper by directly calling the PyObject_Call* functions.

In case you're interested about the OP-Codes, you can disassemble the function:

import dis
    
dis.dis(max_lambda_default)

 0 LOAD_GLOBAL              0 (max)
 2 LOAD_FAST                0 (lst)
 4 LOAD_GLOBAL              1 (add)
 6 BUILD_TUPLE              1
 8 LOAD_CONST               1 (<code object <lambda>>)
10 LOAD_CONST               2 ('max_lambda_default.<locals>.<lambda>')
12 MAKE_FUNCTION            1 (defaults)
14 LOAD_CONST               3 (('key',))
16 CALL_FUNCTION_KW         2
18 RETURN_VALUE

Disassembly of <code object <lambda>>:
 0 LOAD_FAST                1 (add)      <--- (2)
 2 LOAD_CONST               1 (10)
 4 LOAD_CONST               2 (20)
 6 LOAD_CONST               3 (30)
 8 LOAD_FAST                0 (a)
10 CALL_FUNCTION            4            <--- (1)
12 RETURN_VALUE

As you can see the CALL_FUNCTION op code (1) is actually in there.

As an aside: The LOAD_FAST (2) is responsible for the performance difference between the lambda_default and the lambda without default (which has to resort to a slower lookup). That's because loading a name actually starts by checking the local scope (the function scope), in the case of add=add the add function is in the local scope, so it can make a faster lookup. If you don't have it in the local scope it will check each surrounding scope until it finds the name and it only stops when it reaches the global scope. And that lookup is done every time the lambda is called!

like image 192
MSeifert Avatar answered Oct 12 '22 14:10

MSeifert