Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reassemble .py file from bytecode

Tags:

python

Problem Statement

I have a file (no extension) with some nicely formatted python opcodes that I would like to reassemble into the original .py file (or as close as I can).

Recreating Problem

I can recreate a file like the one I have. Begin with a file called test.py, with the contents:

a = 1
b = 2
print(a+b)

By running python3 -m dis test.py, I get the following output:

  1       0 LOAD_CONST               0 (1)
          2 STORE_NAME               0 (a)

  2       4 LOAD_CONST               1 (2)
          6 STORE_NAME               1 (b)

  3       8 LOAD_NAME                2 (print)
         10 LOAD_NAME                0 (a)
         12 LOAD_NAME                1 (b)
         14 BINARY_ADD
         16 CALL_FUNCTION            1
         18 POP_TOP
         20 LOAD_CONST               2 (None)
         22 RETURN_VALUE

I would like to reconstruct the original test.py file from this output.

What I've tried

I have already tried running uncompyle6 on the output, but it errors out with the following message:

ImportError: Unknown magic number 8224 in test.pyc

I do not know the original python version used to generate the original file to obtain the magic number, nor do I know if the magic number is the only thing missing from the file.

Someone has asked a similar question here a long time ago: Reassembling Python bytecode to the original code? The proposed answer is antiquated, but even following the updates, the current answer should be to use uncompyle6, but I can't seem to get that to work.

like image 900
ceiltechbladhm Avatar asked Jan 02 '23 06:01

ceiltechbladhm


1 Answers

There is some confusion about what uncompyle6 does. It starts with Python bytecode, or more accurately "wordcode" if this is Python 3.6 or greater. Alternatively it is often used to decompile a Python-compiled file which contains bytecode.

Judging from what you show above, what I believe you want to do is start with a text representation of bytecode produced by the version-specific disassembler that comes with (and only completely works on) the version that Python is running.

Here is the reason you get that strange "Import Error" message above from uncompyle6. It looks at the beginning of the text file you have weirdly called a Python compiled file. That file starts with the ASCII-encoded string "1" and uncompyle6 is interpreting that according to the specific format for Python compiled file, where the beginning of the file contains some sort of Python-encoded version string, technically called a "magic number".

Never fear though, I have written a few more tools to get you closer to where you want to get to. Specifically, I wrote a Python cross-version assembler to match Python's built-in disassembler.

This is in my github project python-xasm.

Using that, you can produce real Python bytecode which can be run. And if the code you wrote indeed is like from something Python spit out, it probably can be decompiled back into high-level Python.

However, xasm currently does need a little more help than what you have above. Specifically it won't guess from opcode names which Python version(s) they can belong to. Matching opcode names with acceptable Python versions is even harder than you might think. If you see LOAD_CONST, you also need to consider whether this is instruction takes 2 bytes or 3. If 2 then it is Python 3.6 and greater otherwise it is Python < 3.6. And if that is not hard enough already, some versions of Python change the opcode value for a particular opcode name! Therefore it is possible that you might not be able to exactly determine which Python interpreter some assembly comes from. But I am assuming you don't care, as long as whatever you come up with is consistent.

So with the above, now back to solve your question.

First produce real bytecode. You could do it like this

import py_compile 
py_compile.compile("/tmp/test.py", "/tmp/test.pyc", 'exec')

Now instead of using the builtin python disassembler, use the cross-version disassembler I wrote and that comes with xdis called pydisasm, and use the --asm option which will output the assembly in a xasm-friendly way:

$ pydisasm --asm 
# pydisasm version 4.0.0-git
# Python bytecode 3.6 (3379)
# Disassembled from Python 3.6.5 (default, Aug 12 2018, 16:37:27)
# [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)]
# Timestamp in code: 1554492841 (2019-04-05 15:34:01)
# Source code size mod 2**32: 23 bytes

# Method Name:       <module>
# Filename:          exec
# Argument count:    0
# Kw-only arguments: 0
# Number of locals:  0
# Stack size:        3
# Flags:             0x00000040 (NOFREE)
# First Line:        1
# Constants:
#    0: 1
#    1: 2
#    2: None
# Names:
#    0: a
#    1: b
#    2: print
  1:
            LOAD_CONST           (1)
            STORE_NAME           (a)

  2:
            LOAD_CONST           (2)
            STORE_NAME           (b)

  3:
            LOAD_NAME            (print)
            LOAD_NAME            (a)
            LOAD_NAME            (b)
            BINARY_ADD
            CALL_FUNCTION        1
            POP_TOP
            LOAD_CONST           (None)
            RETURN_VALUE

Notice all of the additional information in comments at the top of the file which contains some really arcane stuff like "stack size" and "flags". This and most of the other stuff needs to be stored in a Python bytecode file.

So save this to a file, and then you can assemble that to bytecode. And then decompile it.

$ ./xasm/xasm_cli.py /tmp/test.pyasm
Wrote /tmp/test.pyc
$ uncompyle6 /tmp/test.pyc
# uncompyle6 version 3.2.6
# Python bytecode 3.6 (3379)
# Decompiled from: Python 3.6.5 (default, Aug 12 2018, 16:37:27)
# [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)]
# Embedded file name: exec
# Compiled at: 2019-04-05 15:34:01
# Size of source mod 2**32: 23 bytes
a = 1
b = 2
print(a + b)
# okay decompiling /tmp/test.pyc

I gave a lightning talk at Pycon2018 in Medellín Columbia related to this. Sorry you missed it, but you can find a video of it here http://rocky.github.io/pycon2018-light.co

It shows how to:

  • produce a Python compiled file from an ASCII-encoded Python source text,
  • modify it to remove tail recursion,
  • write that back out to a Python compiled file, and then
  • run the code.

Of course, you can't decompile that because there is no easily Python that mimics this closely - it was hand modified.

Lastly it seems like you are also interested in how the bytecode and the source code are related. So I'll mention that uncompyle6 has options --tree and the even more verbose --grammar which will show the steps taken to reconstruct the Python from the Python bytecode.

like image 149
rocky Avatar answered Jan 03 '23 20:01

rocky