Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance overhead of nested functions in Python

In Python 3.9, nested functions are surprisingly slower than normal functions, around 10% for my example.

from timeit import timeit

def f():
    return 0

def factory():
    def g():
        return 0

    return g

g = factory()

print(timeit("f()", globals=globals()))
#> 0.074835498
print(timeit("g()", globals=globals()))
#> 0.08470309999999998

dis.dis show the same bytecode, and the only difference that I've found was in function internal flags. Indeed, dis.show_code reveals that g has a flags NESTED while f has not.

However, the flags can be removed, and it makes g as fast as f.

import inspect
g.__code__ = g.__code__.replace(co_flags=g.__code__.co_flags ^ inspect.CO_NESTED)
print(timeit("f()", globals=globals()))
#> 0.07321161100000001
print(timeit("g()", globals=globals()))
#> 0.07439838800000001

I've tried to look at CPython code to understand how CO_NESTED flag could impact function execution, but I've found nothing. Is there any explanation to this performance difference relative to the CO_NESTED flag?

EDIT: Removing CO_NESTED flag seems also to have no impact on function execution, except the overhead, even when it has captured variable.

import inspect
global_var = 40
def factory():
    captured_var = 2
    def g():
        return global_var + captured_var
    return g
g = factory()
assert g() == 42

g.__code__ = g.__code__.replace(co_flags=g.__code__.co_flags ^ inspect.CO_NESTED)
assert g() == 42  # function still works as expected
like image 546
wyfo Avatar asked Jun 25 '21 22:06

wyfo


1 Answers

I may be wrong about it but I think the difference comes from the fact, that g can potentially reference the variables local to factory and as such needs access to two scopes for any variable lookup: globals as well as factory. It may well be that securing this additional scope (or merging the scope from factory and globals) is the cause of the overhead you observe. A good hint that it happens is if you nest another level of functions:

def factory():
    def ff():
        def g():
            return 0

        return g
    return ff()

g = factory()  # please note that it is equivalent from the perspective of time measurement

Timings:

print(timeit("f()", globals=globals(), number=100000000))
# > 6.792911
print(timeit("g()", globals=globals(), number=100000000))
# > 7.8184555

In your first timing case I get +5,7% (it was +13.5% with your numbers), in my second example: +15,1%.

like image 87
sophros Avatar answered Sep 22 '22 09:09

sophros