Earlier in the day, I was experimenting heavily with docstrings and the dis
module, and came across something I can't seem to find the answer for.
First, I create a file test.py
with the following content:
def foo():
pass
Just this, and nothing else.
I then opened up an interpreter to observe the bytecode of the program. You can get it like this:
code = compile(open('test.py').read(), '', 'exec')
The first argument is the code in string form, the second is for debugging purposes (leaving it blank is O.K.) while the 3rd is the mode. I've tried both single
and exec
. The result is the same.
After this, you can decompile the bytecode with dis
.
>>> import dis
>>> dis.dis(code)
The bytecode output is this:
1 0 LOAD_CONST 0 (<code object foo at 0x10a25e8b0, file "", line 1>)
3 MAKE_FUNCTION 0
6 STORE_NAME 0 (foo)
9 LOAD_CONST 1 (None)
12 RETURN_VALUE
Reasonable, for such a simple script. And it made sense too.
Then I tried compiling it through command line like this:
$ python -m py_compile test.py
This resulted in the bytecode generated and placed inside a test.pyc
file. The contents can again be disassembled with:
>>> import dis
>>> dis.dis(open('test.pyc').read())
And this is the output:
>> 0 ROT_THREE
1 <243> 2573
>> 4 <157> 19800
>> 7 BUILD_CLASS
8 DUP_TOPX 0
11 STOP_CODE
12 STOP_CODE
>> 13 STOP_CODE
14 STOP_CODE
15 STOP_CODE
16 STOP_CODE
17 POP_TOP
18 STOP_CODE
19 STOP_CODE
20 STOP_CODE
21 BINARY_AND
22 STOP_CODE
23 STOP_CODE
24 STOP_CODE
25 POP_JUMP_IF_TRUE 13
28 STOP_CODE
29 STOP_CODE
30 LOAD_CONST 0 (0)
33 MAKE_FUNCTION 0
36 STORE_NAME 0 (0)
39 LOAD_CONST 1 (1)
42 RETURN_VALUE
43 STORE_SLICE+0
44 ROT_TWO
45 STOP_CODE
46 STOP_CODE
47 STOP_CODE
48 DUP_TOPX 0
51 STOP_CODE
52 STOP_CODE
53 STOP_CODE
54 STOP_CODE
55 STOP_CODE
56 STOP_CODE
57 POP_TOP
58 STOP_CODE
59 STOP_CODE
60 STOP_CODE
61 INPLACE_POWER
62 STOP_CODE
63 STOP_CODE
64 STOP_CODE
65 POP_JUMP_IF_TRUE 4
68 STOP_CODE
69 STOP_CODE
70 LOAD_CONST 0 (0)
73 RETURN_VALUE
74 STORE_SLICE+0
75 POP_TOP
76 STOP_CODE
77 STOP_CODE
78 STOP_CODE
79 INPLACE_XOR
80 STORE_SLICE+0
81 STOP_CODE
82 STOP_CODE
83 STOP_CODE
84 STOP_CODE
85 STORE_SLICE+0
86 STOP_CODE
87 STOP_CODE
88 STOP_CODE
89 STOP_CODE
90 STORE_SLICE+0
91 STOP_CODE
92 STOP_CODE
93 STOP_CODE
94 STOP_CODE
95 STORE_SLICE+0
96 STOP_CODE
97 STOP_CODE
98 STOP_CODE
99 STOP_CODE
100 POP_JUMP_IF_TRUE 7
103 STOP_CODE
104 STOP_CODE
105 LOAD_GLOBAL 29541 (29541)
108 LOAD_GLOBAL 28718 (28718)
111 SETUP_EXCEPT 884 (to 998)
114 STOP_CODE
115 STOP_CODE
116 STOP_CODE
117 BUILD_TUPLE 28527
120 POP_TOP
121 STOP_CODE
122 STOP_CODE
123 STOP_CODE
124 POP_JUMP_IF_TRUE 2
127 STOP_CODE
128 STOP_CODE
129 STOP_CODE
130 POP_TOP
131 INPLACE_XOR
132 STORE_SLICE+0
133 POP_TOP
134 STOP_CODE
135 STOP_CODE
136 STOP_CODE
137 LOAD_LOCALS
138 STOP_CODE
139 STOP_CODE
140 STOP_CODE
141 STOP_CODE
142 STORE_SLICE+0
143 STOP_CODE
144 STOP_CODE
145 STOP_CODE
146 STOP_CODE
147 STORE_SLICE+0
148 STOP_CODE
149 STOP_CODE
150 STOP_CODE
151 STOP_CODE
152 STORE_SLICE+0
153 STOP_CODE
154 STOP_CODE
155 STOP_CODE
156 STOP_CODE
157 POP_JUMP_IF_TRUE 7
160 STOP_CODE
161 STOP_CODE
162 LOAD_GLOBAL 29541 (29541)
165 LOAD_GLOBAL 28718 (28718)
168 SETUP_EXCEPT 2164 (to 2335)
171 STOP_CODE
172 STOP_CODE
173 STOP_CODE
174 STORE_SUBSCR
175 IMPORT_FROM 25711 (25711)
178 <117> 25964
181 BINARY_LSHIFT
182 POP_TOP
183 STOP_CODE
184 STOP_CODE
185 STOP_CODE
186 POP_JUMP_IF_TRUE 0
189 STOP_CODE
190 STOP_CODE
The difference is staggering. Why is there such a stark contrast in the byte code depending on how it was compiled?
A byte code compiler translates a complex high-level language like Lisp into a very simple language that can be interpreted by a very fast byte code interpreter, or virtual machine. The internal representation of this simple language is a string of bytes, hence the name byte code.
When we execute a source code (a file with a . py extension), Python first compiles it into a bytecode. The bytecode is a low-level platform-independent representation of your source code, however, it is not the binary machine code and cannot be run by the target machine directly.
A compiled script file is just what you think it is: it's a file containing the bytecode of a compiled script. Unlike text, a compiled script file can be executed without being compiled (because it's already compiled); the runtime engine is fed the bytecode and can leap into action immediately.
When the bytecode is interperted, it is executed through the JVM interpreter, not directly on the processor, when it is compiled, it is compiled to native machine language and executed directly on the CPU.
The contents of a .pyc
file are not raw Python bytecode instructions. A .pyc
file contains
You basically just disassembled garbage the second time.
If you want to disassemble the code from a .pyc
, you can skip 8 bytes, unmarshal the code object, and then call dis.dis
on the code object:
import dis
import marshal
with open('test.pyc', 'b') as f:
f.seek(8)
dis.dis(marshal.load(f))
Note that the .pyc
format is free to change from version to version, so this might not always work. In fact, it already has changed since the time of the referenced article; they added 4 bytes after the timestamp for the source file size in Python 3.3, so on 3.3 and up, you have to skip 12 bytes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With