Python Compilation/Interpretation Process

Tags:

I'm trying to understand the python compiler/interpreter process more clearly. Unfortunately, I have not taken a class in interpreters nor have I read much about them.

Basically, what I understand right now is that Python code from .py files is first compiled into python bytecode (which I assume are the .pyc files I see occasionally?). Next, the bytecode is compiled into machine code, a language the processor actually understands. Pretty much, I've read this thread Why python compile the source to bytecode before interpreting?

Could somebody give me a good explanation of the whole process keeping in mind that my knowledge of compilers/interpreters is almost non-existent? Or, if that's not possible, maybe give me some resources that give quick overviews of compilers/interpreters?

Thanks

430

asked Jul 21 '10 13:07

NickHalden

Video Answer

1 Answers

The bytecode is not actually interpreted to machine code, unless you are using some exotic implementation such as pypy.

Other than that, you have the description correct. The bytecode is loaded into the Python runtime and interpreted by a virtual machine, which is a piece of code that reads each instruction in the bytecode and executes whatever operation is indicated. You can see this bytecode with the dis module, as follows:

>>> def fib(n): return n if n < 2 else fib(n - 2) + fib(n - 1) ...  >>> fib(10) 55 >>> import dis >>> dis.dis(fib)   1           0 LOAD_FAST                0 (n)               3 LOAD_CONST               1 (2)               6 COMPARE_OP               0 (<)               9 JUMP_IF_FALSE            5 (to 17)              12 POP_TOP                           13 LOAD_FAST                0 (n)              16 RETURN_VALUE                 >>   17 POP_TOP                           18 LOAD_GLOBAL              0 (fib)              21 LOAD_FAST                0 (n)              24 LOAD_CONST               1 (2)              27 BINARY_SUBTRACT                   28 CALL_FUNCTION            1              31 LOAD_GLOBAL              0 (fib)              34 LOAD_FAST                0 (n)              37 LOAD_CONST               2 (1)              40 BINARY_SUBTRACT                   41 CALL_FUNCTION            1              44 BINARY_ADD                        45 RETURN_VALUE         >>>

Detailed explanation

It is quite important to understand that the above code is never executed by your CPU; nor is it ever converted into something that is (at least, not on the official C implementation of Python). The CPU executes the virtual machine code, which performs the work indicated by the bytecode instructions. When the interpreter wants to execute the fib function, it reads the instructions one at a time, and does what they tell it to do. It looks at the first instruction, LOAD_FAST 0, and thus grabs parameter 0 (the n passed to fib) from wherever parameters are held and pushes it onto the interpreter's stack (Python's interpreter is a stack machine). On reading the next instruction, LOAD_CONST 1, it grabs constant number 1 from a collection of constants owned by the function, which happens to be the number 2 in this case, and pushes that onto the stack. You can actually see these constants:

>>> fib.func_code.co_consts (None, 2, 1)

The next instruction, COMPARE_OP 0, tells the interpreter to pop the two topmost stack elements and perform an inequality comparison between them, pushing the Boolean result back onto the stack. The fourth instruction determines, based on the Boolean value, whether to jump forward five instructions or continue on with the next instruction. All that verbiage explains the if n < 2 part of the conditional expression in fib. It will be a highly instructive exercise for you to tease out the meaning and behaviour of the rest of the fib bytecode. The only one, I'm not sure about is POP_TOP; I'm guessing JUMP_IF_FALSE is defined to leave its Boolean argument on the stack rather than popping it, so it has to be popped explicitly.

Even more instructive is to inspect the raw bytecode for fib thus:

>>> code = fib.func_code.co_code >>> code '|\x00\x00d\x01\x00j\x00\x00o\x05\x00\x01|\x00\x00S\x01t\x00\x00|\x00\x00d\x01\x00\x18\x83\x01\x00t\x00\x00|\x00\x00d\x02\x00\x18\x83\x01\x00\x17S' >>> import opcode >>> op = code[0] >>> op '|' >>> op = ord(op) >>> op 124 >>> opcode.opname[op] 'LOAD_FAST' >>>

Thus you can see that the first byte of the bytecode is the LOAD_FAST instruction. The next pair of bytes, '\x00\x00' (the number 0 in 16 bits) is the argument to LOAD_FAST, and tells the bytecode interpreter to load parameter 0 onto the stack.

answered Sep 29 '22 19:09

Marcelo Cantos

Related questions
                            
                                Pros and cons for different configuration formats?
                            
                                Django on IronPython
                            
                                unzipping file results in "BadZipFile: File is not a zip file"
                            
                                matplotlib backends - do I care?
                            
                                Returning API Error Messages with Python and Flask
                            
                                How to limit the heap size?
                            
                                Tool to determine what lowest version of Python required?
                            
                                Why is linear read-shuffled write not faster than shuffled read-linear write?
                            
                                pytest using fixtures as arguments in parametrize
                            
                                How to implement custom indentation when pretty-printing with the JSON module?
                            
                                Python Requests: Post JSON and file in single request
                            
                                Why does PyCharm use 120 Character Lines even though PEP8 Specifies 79?
                            
                                Why does python use two underscores for certain things?
                            
                                Using Sql Server with Django in production
                            
                                Argparse"ArgumentError: argument -h/--help: conflicting option string(s): -h, --help"
                            
                                How to get scalar value on a cell using conditional indexing
                            
                                How do I add python libraries to an AWS lambda function for Alexa?
                            
                                Why does Python's itertools.permutations contain duplicates? (When the original list has duplicates)
                            
                                Should I avoid converting to a string if a value is already a string?
                            
                                Celery: When should you choose Redis as a message broker over RabbitMQ?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Compilation/Interpretation Process

Tags:

python

python-internals

compiler-construction

interpreter