Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read python bytecode?

I am having a lot of difficulty understanding Python's bytecode and its dis module.

import dis
def func():
   x = 1
dis.dis(func)

The above code when typed in the interpreter produces the following output:

    0 LOAD_CONST                  1(1)
    3 STORE_FAST                  0(x)
    6 LOAD_CONST                  0(NONE)
    9 RETURN_VALUE

E.g.:

What is the meaning of LOAD_CONST, STORE_FAST and the numbers like 0, 3, 6 and 9?

A specific resource, where I can find this information would be much appreciated.

like image 415
Pratik Singhal Avatar asked Oct 24 '13 07:10

Pratik Singhal


People also ask

What is Python bytecode?

Bytecode is the low-level representation of the python code which is the platform-independent, but the code is not the binary code and so it cannot run directly on the targeted machine. It is a set of instructions for the virtual machine which is also called as the Python Virtual Machine[PVM].

Can you run Python bytecode?

The bytecode is a low-level platform-independent representation of your source code, however, it is not the binary machine code and cannot be run by the target machine directly. In fact, it is a set of instructions for a virtual machine which is called the Python Virtual Machine (PVM).

Can you decompile Python bytecode?

Decompyle is a python disassembler and decompiler which converts Python byte-code (. pyc or . pyo) back into equivalent Python source. Verification of the produced code (re-compiled) is avaliable as well.


1 Answers

The numbers before the bytecodes are offsets into the original binary bytecodes:

>>> func.__code__.co_code
'd\x01\x00}\x00\x00d\x00\x00S'

Some bytecodes come with additional information (arguments) that influence how each bytecode works, the offset tells you at what position in the bytestream the bytecode was found.

The LOAD_CONST bytecode (ASCII d, hex 64) is followed by two additional bytes encoding a reference to a constant associated with the bytecode, for example. As a result, the STORE_FAST opcode (ASCII }, hex 7D) is found at index 3.

The dis module documentation lists what each instruction means. For LOAD_CONST, it says:

Pushes co_consts[consti] onto the stack.

which refers to the co_consts structure that is always present with a code object; the compiler constructs that:

>>> func.__code__.co_consts
(None, 1)

The opcode loads index 1 from that structure (the 01 00 bytes in the bytecode encode a 1), and dis has looked that up for you; it is the value 1.

The next instruction, STORE_FAST is described as:

Stores TOS into the local co_varnames[var_num].

Here TOS refers to Top Of Stack; note that the LOAD_CONST just pushed something onto the stack, the 1 value. co_varnames is another structure; it references local variable names, the opcode references index 0:

>>> func.__code__.co_varnames
('x',)

dis looked that up too, and the name you used in your code is x. Thus, this opcode stored 1 into x.

Another LOAD_CONST loads None onto the stack from index 0, followed by RETURN_VALUE:

Returns with TOS to the caller of the function.

so this instruction takes the top of the stack (with the None constant) and returns from this code block. None is the default return value for functions without an explicit return statement.

You omitted something from the dis output, the line numbers:

>>> dis.dis(func)
  2           0 LOAD_CONST               1 (1)
              3 STORE_FAST               0 (x)
              6 LOAD_CONST               0 (None)
              9 RETURN_VALUE        

Note the 2 on the first line; that's the line number in the original source that contains the Python code that was used for these instructions. Python code objects have co_lnotab and co_firstlineno attributes that let you map bytecodes back to line numbers in the original source. dis does this for you when displaying a disassembly.

like image 170
Martijn Pieters Avatar answered Oct 05 '22 11:10

Martijn Pieters