Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Functions as objects in Python: what exactly is stored in memory?

Tags:

python

I've been using Python for a while now to solve practical problems, but I still don't have a proper theoretical understanding of what's going on behind the hood. For example, I'm struggling to understand how Python manages to treat functions as objects. I know that functions are objects of the class 'function', with a 'call' method, and I am aware that I can make my custom-made classes behave like functions by writing a 'call method' for them. But I can't figure out what precisely gets stored in memory when new functions are created, and how to access the information that gets stored.

To experiment, I wrote a little script that creates lots of function objects and stores them in a list. I noticed that this program used up a lot of memory.

funct_list = []
for i in range(10000000):
    def funct(n):
        return n + i
    funct_list.append(funct)

My questions are:

  • What precisely gets stored in RAM when I define a new function object? Am I storing the details of how the function is to be implemented?

  • If so, does my function object have attributes or methods that allow me to "inspect" (or possibly even "alter retrospectively") the way the function behaves?

  • Maybe my previous question is circular, because the methods of the function object are functions in their own right...

  • In my code above, some of the RAM is used simply to store the "pointers" to my function objects in the list. The rest of the RAM is presumably used to store the interesting stuff about how my function objects actually work. Roughly how is the RAM distributed between these two purposes?

  • Suppose I alter the code snippet by making the function do more complicated stuff. Will I use up much more RAM as a consequence? (I would expect so. But when I altered the definition of my function by filling its body with 1000 lines of junk, there didn't appear to be any difference in the amount of RAM used up.)

I would love to find a comprehensive reference about this. But whatever I type into google, I can't seem to find what I'm looking for!

like image 894
Kenny Wong Avatar asked Aug 17 '17 23:08

Kenny Wong


2 Answers

A function object's data is divided into two primary parts. The parts that would be the same for all functions created by the same function definition are stored in the function's code object, while the parts that can change even between functions created from the same function definition are stored in the function object.

The most interesting part of a function is probably its bytecode. This is the core data structure that says what to actually do to execute a function. It's stored as a bytestring in the function's code object, and you can examine it directly:

>>> def fib(i):
...     x, y = 0, 1
...     for _ in range(i):
...         x, y = y, x+y
...     return x
... 
>>> fib.__code__.co_code
b'd\x03\\\x02}\x01}\x02x\x1et\x00|\x00\x83\x01D\x00]\x12}\x03|\x02|\x01|\x02\x17\x00\x02\x00}\x01}\x02q\x1
2W\x00|\x01S\x00'

...but it's not designed to be human-readable.

With enough knowledge of the implementation details of Python bytecode, you could parse that yourself, but describing all that would take way too long. Instead, we'll use the dis module to disassemble the bytecode for us:

>>> import dis
>>> dis.dis(fib)
  2           0 LOAD_CONST               3 ((0, 1))
              2 UNPACK_SEQUENCE          2
              4 STORE_FAST               1 (x)
              6 STORE_FAST               2 (y)

  3           8 SETUP_LOOP              30 (to 40)
             10 LOAD_GLOBAL              0 (range)
             12 LOAD_FAST                0 (i)
             14 CALL_FUNCTION            1
             16 GET_ITER
        >>   18 FOR_ITER                18 (to 38)
             20 STORE_FAST               3 (_)
  4          22 LOAD_FAST                2 (y)
             24 LOAD_FAST                1 (x)
             26 LOAD_FAST                2 (y)
             28 BINARY_ADD
             30 ROT_TWO
             32 STORE_FAST               1 (x)
             34 STORE_FAST               2 (y)
             36 JUMP_ABSOLUTE           18
        >>   38 POP_BLOCK
  5     >>   40 LOAD_FAST                1 (x)
             42 RETURN_VALUE

There are a number of columns in the output here, but we're mostly interested in the one with the ALL_CAPS and the columns to the right of that.

The ALL_CAPS column shows the function's bytecode instructions. For example, LOAD_CONST loads a constant value, and BINARY_ADD is the instruction to add two objects with +. The next column, with the numbers, is for bytecode arguments. For example, LOAD_CONST 3 says to load the constant at index 3 in the code object's constants. These are always integers, and they're packed into the bytecode string along with the bytecode instructions. The last column mostly provides human-readable explanations of the bytecode arguments, for example, saying that the 3 in LOAD_CONST 3 corresponds to the constant (0, 1), or that the 1 in STORE_FAST 1 corresponds to local variable x. The information in this column doesn't actually come from the bytecode string; it's resolved by examining other parts of the code object.


The rest of a function object's data is primarily stuff needed to resolve bytecode arguments, like the function's closure or its global variable dict, and stuff that just exists because it's handy for introspection, like the function's __name__.

If we take a look at the Python 3.6 function object struct definition at C level:

typedef struct {
    PyObject_HEAD
    PyObject *func_code;    /* A code object, the __code__ attribute */
    PyObject *func_globals; /* A dictionary (other mappings won't do) */
    PyObject *func_defaults;    /* NULL or a tuple */
    PyObject *func_kwdefaults;  /* NULL or a dict */
    PyObject *func_closure; /* NULL or a tuple of cell objects */
    PyObject *func_doc;     /* The __doc__ attribute, can be anything */
    PyObject *func_name;    /* The __name__ attribute, a string object */
    PyObject *func_dict;    /* The __dict__ attribute, a dict or NULL */
    PyObject *func_weakreflist; /* List of weak references */
    PyObject *func_module;  /* The __module__ attribute, can be anything */
    PyObject *func_annotations; /* Annotations, a dict or NULL */
    PyObject *func_qualname;    /* The qualified name */

    /* Invariant:
     *     func_closure contains the bindings for func_code->co_freevars, so
     *     PyTuple_Size(func_closure) == PyCode_GetNumFree(func_code)
     *     (func_closure may be NULL if PyCode_GetNumFree(func_code) == 0).
     */
} PyFunctionObject;

we can see that there's the code object, and then

  • the global variable dict,
  • the default argument values,
  • the keyword-only argument default values,
  • the function's closure cells,
  • the docstring,
  • the name,
  • the __dict__,
  • the list of weak references to the function,
  • the __module__,
  • the annotations, and
  • the __qualname__, the fully qualified name

Inside the PyObject_HEAD macro, there's also the type pointer and the refcount (and some other metadata in a debug build). The GC also places some GC metadata right before each PyFunctionObject struct in memory.

We didn't have to go straight to C to examine most of that - we could have looked at the dir and filtered out non-instance attributes, since most of that data is available at Python level - but the struct definition provides a nice, commented, uncluttered list.

You can examine the code object struct definition too, but the contents aren't as clear if you're not already familiar with code objects, so I'm not going to embed it in the post. I'll just explain code objects.

The core component of a code object is a bytestring of Python bytecode instructions and arguments. We examined one of those earlier. In addition, the code object contains things like a tuple of the constants the function refers to, and a lot of other internal metadata required to figure out how to actually execute each instruction. Not all the metadata - some of it comes from the function object - but a lot of it. Some of it, like that tuple of constants, is fairly easily understandable, and some of it, like co_flags (a bunch of internal flags) or co_stacksize (the size of the stack used for temporary values) is more esoteric.

like image 75
user2357112 supports Monica Avatar answered Oct 20 '22 07:10

user2357112 supports Monica


Functions are objects just like any other: they are instances of a type (or class). You can get the type of a function using type(f), where f is a function, or use the types module (types.FunctionType).

When you define a function, Python builds a function object and assigns a name to it. This machinery is hidden behind the def statement, but it works the same as the instantiation of any other type.

Which means that in Python, function definitions are executed, unlike in some other languages. Among other things, this means that functions don't exist until the flow of code reaches them, so you can't call a function before it has been defined.

The inspect module lets you snoop around inside various kinds of objects. This table in its documentation is useful for seeing what kinds of components functions and related types of objects (such as methods) are made from, and how to get to them.

The actual code inside a function becomes a code object, which contains the byte code that is executed by the Python interpreter. You can see this using the dis module.

Looking at the help() of the types for functions and code objects is interesting, as it shows what arguments you need to pass in to build these objects. It is possible to make new functions from raw byte code, to copy byte code from one function to another but use a different closure, and so on.

help(type(lambda: 0))
help(type((lambda: 0).__code__))

You can also build code objects using the compile() function and then build functions out of them.

Fun Fact

Any object whose type has a __call__() method is callable. Functions are callable, and their type has a __call__() method. Which is callable. Which means it, too, has a __call__() method, which has a __call__() method, ad nauseam, ad infinitum.

How does a function actually get called, then? Python actually bypasses __call__ for objects with __call__ implemented in C, such as a Python function's __call__ method. Indeed, (lambda: 0).__call__ is a method-wrapper, which is used to wrap a C function.

like image 20
kindall Avatar answered Oct 20 '22 06:10

kindall