Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Byte code of a compiled script differs based on how it was compiled [duplicate]

Earlier in the day, I was experimenting heavily with docstrings and the dis module, and came across something I can't seem to find the answer for.

First, I create a file test.py with the following content:

def foo():
    pass

Just this, and nothing else.

I then opened up an interpreter to observe the bytecode of the program. You can get it like this:

code = compile(open('test.py').read(), '', 'exec')

The first argument is the code in string form, the second is for debugging purposes (leaving it blank is O.K.) while the 3rd is the mode. I've tried both single and exec. The result is the same.

After this, you can decompile the bytecode with dis.

>>> import dis
>>> dis.dis(code)

The bytecode output is this:

 1           0 LOAD_CONST               0 (<code object foo at 0x10a25e8b0, file "", line 1>)
              3 MAKE_FUNCTION            0
              6 STORE_NAME               0 (foo)
              9 LOAD_CONST               1 (None)
             12 RETURN_VALUE        

Reasonable, for such a simple script. And it made sense too.

Then I tried compiling it through command line like this:

$ python -m py_compile test.py

This resulted in the bytecode generated and placed inside a test.pyc file. The contents can again be disassembled with:

>>> import dis
>>> dis.dis(open('test.pyc').read())

And this is the output:

>>    0 ROT_THREE      
      1 <243>            2573
>>    4 <157>           19800
>>    7 BUILD_CLASS    
      8 DUP_TOPX            0
     11 STOP_CODE      
     12 STOP_CODE      
>>   13 STOP_CODE      
     14 STOP_CODE      
     15 STOP_CODE      
     16 STOP_CODE      
     17 POP_TOP        
     18 STOP_CODE      
     19 STOP_CODE      
     20 STOP_CODE      
     21 BINARY_AND     
     22 STOP_CODE      
     23 STOP_CODE      
     24 STOP_CODE      
     25 POP_JUMP_IF_TRUE    13
     28 STOP_CODE      
     29 STOP_CODE      
     30 LOAD_CONST          0 (0)
     33 MAKE_FUNCTION       0
     36 STORE_NAME          0 (0)
     39 LOAD_CONST          1 (1)
     42 RETURN_VALUE   
     43 STORE_SLICE+0  
     44 ROT_TWO        
     45 STOP_CODE      
     46 STOP_CODE      
     47 STOP_CODE      
     48 DUP_TOPX            0
     51 STOP_CODE      
     52 STOP_CODE      
     53 STOP_CODE      
     54 STOP_CODE      
     55 STOP_CODE      
     56 STOP_CODE      
     57 POP_TOP        
     58 STOP_CODE      
     59 STOP_CODE      
     60 STOP_CODE      
     61 INPLACE_POWER  
     62 STOP_CODE      
     63 STOP_CODE      
     64 STOP_CODE      
     65 POP_JUMP_IF_TRUE     4
     68 STOP_CODE      
     69 STOP_CODE      
     70 LOAD_CONST          0 (0)
     73 RETURN_VALUE   
     74 STORE_SLICE+0  
     75 POP_TOP        
     76 STOP_CODE      
     77 STOP_CODE      
     78 STOP_CODE      
     79 INPLACE_XOR    
     80 STORE_SLICE+0  
     81 STOP_CODE      
     82 STOP_CODE      
     83 STOP_CODE      
     84 STOP_CODE      
     85 STORE_SLICE+0  
     86 STOP_CODE      
     87 STOP_CODE      
     88 STOP_CODE      
     89 STOP_CODE      
     90 STORE_SLICE+0  
     91 STOP_CODE      
     92 STOP_CODE      
     93 STOP_CODE      
     94 STOP_CODE      
     95 STORE_SLICE+0  
     96 STOP_CODE      
     97 STOP_CODE      
     98 STOP_CODE      
     99 STOP_CODE      
    100 POP_JUMP_IF_TRUE     7
    103 STOP_CODE      
    104 STOP_CODE      
    105 LOAD_GLOBAL     29541 (29541)
    108 LOAD_GLOBAL     28718 (28718)
    111 SETUP_EXCEPT      884 (to 998)
    114 STOP_CODE      
    115 STOP_CODE      
    116 STOP_CODE      
    117 BUILD_TUPLE     28527
    120 POP_TOP        
    121 STOP_CODE      
    122 STOP_CODE      
    123 STOP_CODE      
    124 POP_JUMP_IF_TRUE     2
    127 STOP_CODE      
    128 STOP_CODE      
    129 STOP_CODE      
    130 POP_TOP        
    131 INPLACE_XOR    
    132 STORE_SLICE+0  
    133 POP_TOP        
    134 STOP_CODE      
    135 STOP_CODE      
    136 STOP_CODE      
    137 LOAD_LOCALS    
    138 STOP_CODE      
    139 STOP_CODE      
    140 STOP_CODE      
    141 STOP_CODE      
    142 STORE_SLICE+0  
    143 STOP_CODE      
    144 STOP_CODE      
    145 STOP_CODE      
    146 STOP_CODE      
    147 STORE_SLICE+0  
    148 STOP_CODE      
    149 STOP_CODE      
    150 STOP_CODE      
    151 STOP_CODE      
    152 STORE_SLICE+0  
    153 STOP_CODE      
    154 STOP_CODE      
    155 STOP_CODE      
    156 STOP_CODE      
    157 POP_JUMP_IF_TRUE     7
    160 STOP_CODE      
    161 STOP_CODE      
    162 LOAD_GLOBAL     29541 (29541)
    165 LOAD_GLOBAL     28718 (28718)
    168 SETUP_EXCEPT     2164 (to 2335)
    171 STOP_CODE      
    172 STOP_CODE      
    173 STOP_CODE      
    174 STORE_SUBSCR   
    175 IMPORT_FROM     25711 (25711)
    178 <117>           25964
    181 BINARY_LSHIFT  
    182 POP_TOP        
    183 STOP_CODE      
    184 STOP_CODE      
    185 STOP_CODE      
    186 POP_JUMP_IF_TRUE     0
    189 STOP_CODE      
    190 STOP_CODE      

The difference is staggering. Why is there such a stark contrast in the byte code depending on how it was compiled?

like image 307
cs95 Avatar asked Jun 23 '17 18:06

cs95


People also ask

What is byte and compile code?

A byte code compiler translates a complex high-level language like Lisp into a very simple language that can be interpreted by a very fast byte code interpreter, or virtual machine. The internal representation of this simple language is a string of bytes, hence the name byte code.

How is Python bytecode different from Python source code?

When we execute a source code (a file with a . py extension), Python first compiles it into a bytecode. The bytecode is a low-level platform-independent representation of your source code, however, it is not the binary machine code and cannot be run by the target machine directly.

What is a compiled script?

A compiled script file is just what you think it is: it's a file containing the bytecode of a compiled script. Unlike text, a compiled script file can be executed without being compiled (because it's already compiled); the runtime engine is fed the bytecode and can leap into action immediately.

When a bytecode is interpreted How does it get?

When the bytecode is interperted, it is executed through the JVM interpreter, not directly on the processor, when it is compiled, it is compiled to native machine language and executed directly on the CPU.


1 Answers

The contents of a .pyc file are not raw Python bytecode instructions. A .pyc file contains

  1. a 4-byte magic number,
  2. a 4-byte modification timestamp, and
  3. a marshalled code object.

You basically just disassembled garbage the second time.

If you want to disassemble the code from a .pyc, you can skip 8 bytes, unmarshal the code object, and then call dis.dis on the code object:

import dis
import marshal

with open('test.pyc', 'b') as f:
    f.seek(8)
    dis.dis(marshal.load(f))

Note that the .pyc format is free to change from version to version, so this might not always work. In fact, it already has changed since the time of the referenced article; they added 4 bytes after the timestamp for the source file size in Python 3.3, so on 3.3 and up, you have to skip 12 bytes.

like image 96
user2357112 supports Monica Avatar answered Oct 28 '22 06:10

user2357112 supports Monica