Let's say we have the following functions:
def functionA(b, c):
def _innerFunction(b, c):
return b + c
return _innerFunction(b, c)
def _outerFunction(b, c):
return b + c
def functionB(b, c):
return _outerFunction(b, c)
functionA
and functionB
will do the same. _outerFunction
is globally available, while _innerFunction
is only available for functionA
. Nested functions are useful for data hiding and privacy, but what about their memory efficiency? For my understanding, the _outerFunction
must only be loaded once, while the _innerFunction
works like a "local" variable and hence must be loaded each time functionA
is called. Is that correct?
Regarding memory, both of them have almost the same memory footprint.
A function is comprised of a code object, containing the actual compiled code, and a function object containing the closure, the name and other dynamic variables.
The code object is compiled for all functions, inner and outer, before the code is run. It is what resides in the .pyc
file.
The difference between an inner and an outer function is the creation of the function object. An outer function will create the function only once, while the inner function will load the same constant code object and create the function every run.
As the code object is equivalent, _inner
and _outer
's memory footprint is equivalent:
functionA
the name will be used to construct the inner function object on each run, while in functionB
the name will be used to refer to the global module and search for the outer function.functionA
.Runtime however is not equivalent: functionB
needs to call a global function which is slightly slower than an inner function, but functionA
needs to create a new function object on each run which is significantly slower.
In order to prove how equivalent they are, let's check the code itself:
>>> functionA.__code__.co_consts
(None, <code object _innerFunction at 0x00000296F0B6A600, file "<stdin>", line 2>, 'functionA.<locals>._innerFunction')
We can see the code object as a const stored inside functionA
. Let's extract the actual compiled bytecode:
>>> functionA.__code__.co_consts[1].co_code
b'|\x00|\x01\x17\x00S\x00'
Now let's extract the bytecode for the outer function:
>>> _outerFunction.__code__.co_code
b'|\x00|\x01\x17\x00S\x00'
It's exactly the same code!
The local variable positions are the same, the code is written the same, and so the actual compiled code is exactly the same.
>>> functionA.__code__.co_names
()
>>> functionB.__code__.co_names
('_outerFunction',)
In functionB
, instead of saving the name in the consts, the name is saved in a co_names
which is later used for calling the global function.
The only difference in memory footprint is thus the code of functionA
and functionB
:
>>> functionA.__code__.co_code
b'd\x01d\x02\x84\x00}\x02|\x02|\x00|\x01\x83\x02S\x00'
>>> functionB.__code__.co_code
b't\x00|\x00|\x01\x83\x02S\x00'
functionA
needs to create a function object on each run, and the name for the inner function includes functionA.<locals>.
, which entails a few extra bytes (which is negligible) and a slower run.
In terms of runtime, if you're calling the inner function multiple times, functionA
is slightly faster:
def functionA(b, c):
def _innerFunction():
return b + c
for i in range(10_000_000):
_innerFunction() # Faster
def _outerFunction(b, c):
return b + c
def functionB(b, c):
for i in range(10_000_000):
_outerFunction(b, c) # Slower
def functionC(b, c):
outerFunction = _outerFunction
for i in range(10_000_000):
outerFunction(b, c) # Almost same as A but still slower.
py -m timeit -s "import temp;" "temp.functionA(1,2)"
1 loop, best of 5: 2.45 sec per loop
py -m timeit -s "import temp;" "temp.functionB(1,2)"
1 loop, best of 5: 3.21 sec per loop
py -m timeit -s "import temp;" "temp.functionC(1,2)"
1 loop, best of 5: 2.66 sec per loop
If you're calling the outer function multiple times, functionB
is significantly faster as you avoid creating the function object:
def functionA(b, c): # Significantly slower
def _innerFunction():
return b + c
return _innerFunction()
def _outerFunction(b, c):
return b + c
def functionB(b, c): # Significantly faster
return _outerFunction(b, c)
py -m timeit -s "import temp;" "for i in range(10_000_000): temp.functionA(1,2)"
1 loop, best of 5: 9.46 sec per loop
py -m timeit -s "import temp;" "for i in range(10_000_000): temp.functionB(1,2)"
1 loop, best of 5: 5.48 sec per loop
@KellyBundy: What about recursion?
My answer is only true for sequential runs.
If the inner function recurses inside both A and B, there's no real difference in runtime or memory consumption, other than A being slightly faster. If function A and B recurse themselves, B will not allow a deeper recursion but will be significantly faster and will require less memory.
On a sequential run, there is no difference in memory as there is one function object either stored in the global module, or stored as a local variable that is constantly recreated. In case of outside (A
| B
) recursion there is a memory difference: The local variable where the _innerFunction
object is stored is not cleared, meaning there is an additional function object created for every recursion inwards. In this specific example, we can see an important distinction between Python and other languages - Python does not have a tail-call optimization, meaning the frames aren't reused and the variables aren't removed when we recurse inwards, even though no one will reference them anymore.
You're welcome to play with the following visualization.
I guess stack space is exactly identical, all differences are on the heap?
When you're working with Python, it's hard to divide things into a stack and heap. The C-stack is irrelevant as Python uses its own virtual stack. Python's stack is actually linked by the function's currently running frame, and is loaded, or created when a function is invoked. It is the reason they're also called stack frames - there's a linked list or "stack" of frames, and each frame has its own mini-stack called a value-stack. Both the stack and the frame are stored on the heap.
There are plenty of benefits for using this approach, and a nice anecdote would be generator functions. I've actually written an article about the subject, but in short, being able to load and unload the stack at will, allows us to pause a function in the middle of its execution, and is the basis for both generators and asyncio.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With