Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory efficiency of nested functions in python

Let's say we have the following functions:

def functionA(b, c):
    def _innerFunction(b, c):
        return b + c
    return _innerFunction(b, c)

def _outerFunction(b, c):
    return b + c

def functionB(b, c):
    return _outerFunction(b, c)

functionA and functionB will do the same. _outerFunction is globally available, while _innerFunction is only available for functionA. Nested functions are useful for data hiding and privacy, but what about their memory efficiency? For my understanding, the _outerFunction must only be loaded once, while the _innerFunction works like a "local" variable and hence must be loaded each time functionA is called. Is that correct?

like image 829
CLRW97 Avatar asked Jan 24 '23 04:01

CLRW97


1 Answers

Regarding memory, both of them have almost the same memory footprint.

A function is comprised of a code object, containing the actual compiled code, and a function object containing the closure, the name and other dynamic variables.

The code object is compiled for all functions, inner and outer, before the code is run. It is what resides in the .pyc file.

The difference between an inner and an outer function is the creation of the function object. An outer function will create the function only once, while the inner function will load the same constant code object and create the function every run.

As the code object is equivalent, _inner and _outer's memory footprint is equivalent:

  • In both cases you have the name of the function as a constant. In the functionA the name will be used to construct the inner function object on each run, while in functionB the name will be used to refer to the global module and search for the outer function.
  • In both cases you need to hold a code object, either in the global module or in functionA.
  • In both cases you have the same parameters and same space saved for variables.

Runtime however is not equivalent: functionB needs to call a global function which is slightly slower than an inner function, but functionA needs to create a new function object on each run which is significantly slower.


In order to prove how equivalent they are, let's check the code itself:

>>> functionA.__code__.co_consts
(None, <code object _innerFunction at 0x00000296F0B6A600, file "<stdin>", line 2>, 'functionA.<locals>._innerFunction')

We can see the code object as a const stored inside functionA. Let's extract the actual compiled bytecode:

>>> functionA.__code__.co_consts[1].co_code
b'|\x00|\x01\x17\x00S\x00'

Now let's extract the bytecode for the outer function:

>>> _outerFunction.__code__.co_code
b'|\x00|\x01\x17\x00S\x00'

It's exactly the same code!

The local variable positions are the same, the code is written the same, and so the actual compiled code is exactly the same.

>>> functionA.__code__.co_names
()
>>> functionB.__code__.co_names
('_outerFunction',)

In functionB, instead of saving the name in the consts, the name is saved in a co_names which is later used for calling the global function.

The only difference in memory footprint is thus the code of functionA and functionB:

>>> functionA.__code__.co_code
b'd\x01d\x02\x84\x00}\x02|\x02|\x00|\x01\x83\x02S\x00'
>>> functionB.__code__.co_code
b't\x00|\x00|\x01\x83\x02S\x00'

functionA needs to create a function object on each run, and the name for the inner function includes functionA.<locals>., which entails a few extra bytes (which is negligible) and a slower run.


In terms of runtime, if you're calling the inner function multiple times, functionA is slightly faster:

def functionA(b, c):
    def _innerFunction():
        return b + c
    for i in range(10_000_000):
        _innerFunction() # Faster

def _outerFunction(b, c):
    return b + c

def functionB(b, c):
    for i in range(10_000_000):
        _outerFunction(b, c) # Slower

def functionC(b, c):
    outerFunction = _outerFunction
    for i in range(10_000_000):
        outerFunction(b, c) # Almost same as A but still slower.

py -m timeit -s "import temp;" "temp.functionA(1,2)"
1 loop, best of 5: 2.45 sec per loop

py -m timeit -s "import temp;" "temp.functionB(1,2)" 
1 loop, best of 5: 3.21 sec per loop

py -m timeit -s "import temp;" "temp.functionC(1,2)" 
1 loop, best of 5: 2.66 sec per loop

If you're calling the outer function multiple times, functionB is significantly faster as you avoid creating the function object:

def functionA(b, c):  # Significantly slower
    def _innerFunction():
        return b + c
    return _innerFunction()

def _outerFunction(b, c):
    return b + c

def functionB(b, c):  # Significantly faster
    return _outerFunction(b, c)


py -m timeit -s "import temp;" "for i in range(10_000_000): temp.functionA(1,2)" 
1 loop, best of 5: 9.46 sec per loop

py -m timeit -s "import temp;" "for i in range(10_000_000): temp.functionB(1,2)" 
1 loop, best of 5: 5.48 sec per loop

@KellyBundy: What about recursion?

My answer is only true for sequential runs.

If the inner function recurses inside both A and B, there's no real difference in runtime or memory consumption, other than A being slightly faster. If function A and B recurse themselves, B will not allow a deeper recursion but will be significantly faster and will require less memory.

On a sequential run, there is no difference in memory as there is one function object either stored in the global module, or stored as a local variable that is constantly recreated. In case of outside (A | B) recursion there is a memory difference: The local variable where the _innerFunction object is stored is not cleared, meaning there is an additional function object created for every recursion inwards. In this specific example, we can see an important distinction between Python and other languages - Python does not have a tail-call optimization, meaning the frames aren't reused and the variables aren't removed when we recurse inwards, even though no one will reference them anymore.

You're welcome to play with the following visualization.

I guess stack space is exactly identical, all differences are on the heap?

When you're working with Python, it's hard to divide things into a stack and heap. The C-stack is irrelevant as Python uses its own virtual stack. Python's stack is actually linked by the function's currently running frame, and is loaded, or created when a function is invoked. It is the reason they're also called stack frames - there's a linked list or "stack" of frames, and each frame has its own mini-stack called a value-stack. Both the stack and the frame are stored on the heap.

There are plenty of benefits for using this approach, and a nice anecdote would be generator functions. I've actually written an article about the subject, but in short, being able to load and unload the stack at will, allows us to pause a function in the middle of its execution, and is the basis for both generators and asyncio.

like image 58
Bharel Avatar answered Feb 03 '23 18:02

Bharel