Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

modifying python bytecode

I was wondering how to modify byte code, then recompile that code so I can use it in python as a function? I've been trying:

a = """
def fact():
    a = 8
    a = 0
"""
c = compile(a, '<string>', 'exec')
w = c.co_consts[0].co_code
dis(w)

which decompiles to:

      0 LOAD_CONST          1 (1)
      3 STORE_FAST          1 (1)
      6 LOAD_CONST          2 (2)
      9 STORE_FAST          1 (1)
     12 LOAD_CONST          0 (0)
     15 RETURN_VALUE   

supposing I want to get rid of lines 0 and 3, I call:

x = c.co_consts[0].co_code[6:16]
dis(x)

which results in :

      0 LOAD_CONST          2 (2)
      3 STORE_FAST          1 (1)
      6 LOAD_CONST          0 (0)
      9 RETURN_VALUE   

my problem is what to do with x, if I try exec x I get an 'expected string without nullbytes and I get the same for exec w, trying to compile x results in: compile() expected string without null bytes.

I'm not sure what the best way to proceed, except maybe I need to create some kind of code-object, but I'm not sure how, but I'm assuming it must be possible aka byteplay, python assemblers et al

I'm using python 2.7.10, but I'd like it to be future compatible (Eg python 3) if it's possible.

like image 328
Bitmap Image Avatar asked Oct 26 '15 14:10

Bitmap Image


People also ask

Can Python bytecode be decompiled?

Decompyle is a python disassembler and decompiler which converts Python byte-code (. pyc or . pyo) back into equivalent Python source. Verification of the produced code (re-compiled) is avaliable as well.

Is there bytecode for Python?

When we execute a source code (a file with a . py extension), Python first compiles it into a bytecode. The bytecode is a low-level platform-independent representation of your source code, however, it is not the binary machine code and cannot be run by the target machine directly.

Can you disassemble Python code?

In Python, the dis module allows disassembly of Python code into the individual instructions executed by the Python interpreter (usually cPython) for each line. Passing a module, function or other piece of code to the dis.

What is Python bytecode?

Bytecode is the low-level representation of the python code which is the platform-independent, but the code is not the binary code and so it cannot run directly on the targeted machine. It is a set of instructions for the virtual machine which is also called as the Python Virtual Machine[PVM].


1 Answers

Update: For sundry reasons I have started writing a Cross-Python-version assembler. See https://github.com/rocky/python-xasm. It is still in very early beta.

As far as I know there is no other currently-maintained Python assembler. PEAK's Bytecode Disassembler was developed for Python 2.6, and later modified to support early Python 2.7.

It is pretty cool from the documentation. But it relies on other PEAK libraries which might be problematic.

I'll go through the whole example to give you a feel for what you'd have to do. It is not pretty, but then you should expect that.

Basically after modifying the bytecode, you need to create a new types.CodeType object. You need a new one because many of the objects in the code type, for good reason, you can't change. For example the interpreter may have some of these object values cached.

After creating code, you can use this in functions that use a code type which can be used in exec or eval.

Or you can write this to a bytecode file. Alas the code format has changed between Python versions 1.3, 1,5, 2.0, 3.0, and 3.8. And by the way so has the optimization and bytecodes. In fact, in Python 3.6 they will be word codes not bytecodes.

So here is what you'd have to do for your example:

a = """
def fact():
    a = 8
    a = 0
    return a
"""
c = compile(a, '<string>', 'exec')
fn_code = c.co_consts[0] # Pick up the function code from the main code
from dis import dis
dis(fn_code)
print("=" * 30)

x = fn_code.co_code[6:16] # modify bytecode

import types
opt_fn_code = types.CodeType(fn_code.co_argcount,
                             # c.co_kwonlyargcount,  Add this in Python3
                             # c.co_posonlyargcount, Add this in Python 3.8+
                             fn_code.co_nlocals,
                             fn_code.co_stacksize,
                             fn_code.co_flags,
                             x,  # fn_code.co_code: this you changed
                             fn_code.co_consts,
                             fn_code.co_names,
                             fn_code.co_varnames,
                             fn_code.co_filename,
                             fn_code.co_name,
                             fn_code.co_firstlineno,
                             fn_code.co_lnotab,   # In general, You should adjust this
                             fn_code.co_freevars,
                             fn_code.co_cellvars)
dis(opt_fn_code)
print("=" * 30)
print("Result is", eval(opt_fn_code))

# Now let's change the value of what's returned
co_consts = list(opt_fn_code.co_consts)
co_consts[-1] = 10

opt_fn_code = types.CodeType(fn_code.co_argcount,
                             # c.co_kwonlyargcount,  Add this in Python3
                             # c.co_posonlyargcount, Add this in Python 3.8+
                             fn_code.co_nlocals,
                             fn_code.co_stacksize,
                             fn_code.co_flags,
                             x,  # fn_code.co_code: this you changed
                             tuple(co_consts), # this is now changed too
                             fn_code.co_names,
                             fn_code.co_varnames,
                             fn_code.co_filename,
                             fn_code.co_name,
                             fn_code.co_firstlineno,
                             fn_code.co_lnotab,   # In general, You should adjust this
                             fn_code.co_freevars,
                             fn_code.co_cellvars)

dis(opt_fn_code)
print("=" * 30)
print("Result is now", eval(opt_fn_code))

When I ran this here is what I got:

  3           0 LOAD_CONST               1 (8)
              3 STORE_FAST               0 (a)

  4           6 LOAD_CONST               2 (0)
              9 STORE_FAST               0 (a)

  5          12 LOAD_FAST                0 (a)
             15 RETURN_VALUE
==============================
  3           0 LOAD_CONST               2 (0)
              3 STORE_FAST               0 (a)

  4           6 LOAD_FAST                0 (a)
              9 RETURN_VALUE
==============================
('Result is', 0)
  3           0 LOAD_CONST               2 (10)
              3 STORE_FAST               0 (a)

  4           6 LOAD_FAST                0 (a)
              9 RETURN_VALUE
==============================
('Result is now', 10)

Notice that the line numbers haven't changed even though I removed in code a couple of lines. That is because I didn't update fn_code.co_lnotab.

If you want to now write a Python bytecode file from this. Here is what you'd do:

co_consts = list(c.co_consts)
co_consts[0] = opt_fn_code
c1 = types.CodeType(c.co_argcount,
                    # c.co_posonlyargcount, Add this in Python 3.8+
                    # c.co_kwonlyargcount,  Add this in Python3
                    c.co_nlocals,
                    c.co_stacksize,
                    c.co_flags,
                    c.co_code,
                    tuple(co_consts),
                    c.co_names,
                    c.co_varnames,
                    c.co_filename,
                    c.co_name,
                    c.co_firstlineno,
                    c.co_lnotab,   # In general, You should adjust this
                    c.co_freevars,
                    c.co_cellvars)

from struct import pack
with open('/tmp/testing.pyc', 'w') as fp:
        fp.write(pack('Hcc', 62211, '\r', '\n')) # Python 2.7 magic number
        import time
        fp.write(pack('I', int(time.time())))
        # In Python 3.7+ you need to PEP 552 bits 
        # In Python 3 you need to write out the size mod 2**32 here
        import marshal
        fp.write(marshal.dumps(c1))

To simplify writing the boilerplate bytecode above, I've added a routine to xasm called write_pycfile().

Now to check the results:

$ uncompyle6 /tmp/testing.pyc
# uncompyle6 version 2.9.2
# Python bytecode 2.7 (62211)
# Disassembled from: Python 2.7.12 (default, Jul 26 2016, 22:53:31)
# [GCC 5.4.0 20160609]
# Embedded file name: <string>
# Compiled at: 2016-10-18 05:52:13


def fact():
    a = 0
# okay decompiling /tmp/testing.pyc
$ pydisasm /tmp/testing.pyc
# pydisasm version 3.1.0
# Python bytecode 2.7 (62211) disassembled from Python 2.7
# Timestamp in code: 2016-10-18 05:52:13
# Method Name:       <module>
# Filename:          <string>
# Argument count:    0
# Number of locals:  0
# Stack size:        1
# Flags:             0x00000040 (NOFREE)
# Constants:
#    0: <code object fact at 0x7f815843e4b0, file "<string>", line 2>
#    1: None
# Names:
#    0: fact
  2           0 LOAD_CONST               0 (<code object fact at 0x7f815843e4b0, file "<string>", line 2>)
              3 MAKE_FUNCTION            0
              6 STORE_NAME               0 (fact)
              9 LOAD_CONST               1 (None)
             12 RETURN_VALUE


# Method Name:       fact
# Filename:          <string>
# Argument count:    0
# Number of locals:  1
# Stack size:        1
# Flags:             0x00000043 (NOFREE | NEWLOCALS | OPTIMIZED)
# Constants:
#    0: None
#    1: 8
#    2: 10
# Local variables:
#    0: a
  3           0 LOAD_CONST               2 (10)
              3 STORE_FAST               0 (a)

  4           6 LOAD_CONST               0 (None)
              9 RETURN_VALUE
$

An alternate approach for optimization is to optimize at the Abstract Syntax Tree level (AST). The compile, eval and exec functions can start from an AST, or you can dump the AST. You could also write this back out as Python source using the Python module astor

Note however that some kinds of optimization like tail-recursion elimination might leave bytecode in a form that it can't be transformed in a truly faithful way to source code. See my pycon2018 Columbia Lightning Talk for a video I made which elminates tail recursion in bytecode to get an idea of what I'm talking about here.

If you want to be able to debug and single step bytecode instructions. See my bytecode interpreter and its bytecode debugger.

like image 148
rocky Avatar answered Sep 22 '22 14:09

rocky