Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do the zeros in python function bytecode mean?

I'm trying to teach myself about how python bytecode works so I can do some stuff with manipulating functions' code (just for fun, not for real usage) so I started with some simple examples, such as:

def f(x):
    return x + 3/x

The bytecode is*:

(124, 0, 0, 100, 1, 0, 124, 0, 0, 20, 23, 83)

So it makes sense to me that 124 is the LOAD_FAST bytecode, and the name of the object being loaded is f.__code__.co_varnames[0] where 0 is the number after 124. And 100 indicates a LOAD_CONST to load f.__code__.co_consts[1] where 1 is the number after 100. But then there are a bunch of auxiliary zeros, like the second and third and fifth zeros that seem to serve no purpose, at least to me. What do they indicate?

Textual bytecode:

>>> dis.dis(f)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (3)
              6 LOAD_FAST                0 (x)
              9 BINARY_DIVIDE       
             10 BINARY_ADD          
             11 RETURN_VALUE   

*Note: In Python 3 (where bytecodes may be different from above), the bytecode can be found via:

>>> list(f.__code__.co_code)
[124, 0, 100, 1, 124, 0, 27, 0, 23, 0, 83, 0]
like image 655
user3002473 Avatar asked Aug 01 '14 15:08

user3002473


People also ask

What does bytecode mean in Python?

The bytecode can be thought of as a series of instructions or a low-level program for the Python interpreter. After version 3.6, Python uses 2 bytes for each instruction. One byte is for the code of that instruction which is called an opcode, and one byte is reserved for its argument which is called the oparg.

Does Python have byte code?

Python Bytecode Instructions. numeric code for operation, corresponding to the opcode values listed below and the bytecode values in the Opcode collections. New in version 3.4. The Python compiler currently generates the following bytecode instructions.

How do you find the byte code in Python?

You can use the list dis. opname to look up the names of bytecode instructions from their decimal byte values if you'd like to try to manually disassemble a function.

How do you check if a function is called in Python?

has_been_called = True return func(*args) wrapper. has_been_called = False return wrapper @calltracker def doubler(number): return number * 2 if __name__ == '__main__': if not doubler. has_been_called: print "You haven't called this function yet" doubler(2) if doubler. has_been_called: print 'doubler has been called!'


1 Answers

A large number of bytecodes take arguments (any bytecode with a codepoint at or over dis.HAVE_ARGUMENT. Those that do have a 2-byte argument, in little-endian order.

You can see the definition for what bytecodes Python currently uses and what they mean in the dis module documenation.

With 2 bytes you can give any bytecode an argument value between 0 and 65535, for bytecodes than need more, you can prefix the bytecode with the EXTENDED_ARG bytecode, adding 2 more bytes for a value between 0 and 4294967295. In theory you could use EXTENDED_ARG multiple times, but the CPython interpreter uses int for the oparg variable and is thus for practical purposes limited to 4-byte values.

As of Python 3.4 the dis module provides you with Instruction instances that make it easier to introspect each bytecode and their arguments. Using this we can walk through the byte codes you found for your function f:

>>> def f(x):
...     return x + 3/x
... 
>>> f.__code__.co_varnames
('x',)
>>> f.__code__.co_consts
(None, 3)
>>> import dis
>>> instructions = dis.get_instructions(f)
>>> instructions
<generator object _get_instructions_bytes at 0x10be77048>
>>> instruction = next(instructions)
>>> instruction
Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='x', argrepr='x', offset=0, starts_line=2, is_jump_target=False)

So the first opcode, 124 or LOAD_FAST puts the value for first local name on the stack; this is the 0 0 argument, little-endian interpreted as integer 0, an index into the code locals array. dis has filled out the argval attribute, showing us that the first local name is x. In the above session I show how you can introspect the code object to see the list of names.

>>> instruction = next(instructions)
>>> instruction
Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval=3, argrepr='3', offset=3, starts_line=None, is_jump_target=False)

The next instruction pushes a constant onto the stack; the argument is now 1 0, or little-endian for integer 1; the second constant associated with the code object. The f.__code__.co_consts tuple shows that it is 3, but the Instruction object gives it too, as the argval attribute.

>>> next(instructions)
Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='x', argrepr='x', offset=6, starts_line=None, is_jump_target=False)

Next we have another LOAD_FAST, pushing another reference to local name x onto the stack.

>>> next(instructions)
Instruction(opname='BINARY_TRUE_DIVIDE', opcode=27, arg=None, argval=None, argrepr='', offset=9, starts_line=None, is_jump_target=False)

This is a bytecode without argument, the opcode 27 is below dis.HAVE_ARGUMENT. No argument is needed, because this opcode takes the top two values on the stack, divides them, pushing the floating point result back on the stack. So the last x and the 3 constant are taken, divided and the result is push back on.

>>> next(instructions)
Instruction(opname='BINARY_ADD', opcode=23, arg=None, argval=None, argrepr='', offset=10, starts_line=None, is_jump_target=False)

Another argument-less bytecode; this one adds up the top two stack values, replacing those with the outcome. The outcome of the BINARY_TRUE_DIVIDE is taken, and the value of x that was pushed on first, and the result is put back on the stack.

>>> next(instructions)
Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=11, starts_line=None, is_jump_target=False)

Last instruction, and another that doesn't take arguments. RETURN_VALUE ends the current frame, returning the top value from the stack as the result to the caller.

like image 156
Martijn Pieters Avatar answered Oct 15 '22 06:10

Martijn Pieters