Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Executing sliced Python byte codes sometimes results in "SystemError: unknown opcode"

Given a code object compiled from the following 3 lines of code:

code = compile('''a = 1 / 0 # bad stuff. avoid running this!
b = 'good stuff'
c = True''', '', 'exec')

which calling dis.dis(code) would disassemble into:

  1           0 LOAD_CONST               0 (1)
              2 LOAD_CONST               1 (0)
              4 BINARY_TRUE_DIVIDE
              6 STORE_NAME               0 (a)

  2           8 LOAD_CONST               2 ('good stuff')
             10 STORE_NAME               1 (b)

  3          12 LOAD_CONST               3 (True)
             14 STORE_NAME               2 (c)
             16 LOAD_CONST               4 (None)
             18 RETURN_VALUE

How do I extract and run just the byte codes for the second line, b = 'good stuff'?

For example, if I want to extract and run just the byte codes for the last line, c = True, which starts from the byte index 12, I can slice the code object's co_code attribute, which contains the raw byte codes, from index 12, to construct a types.CodeType object, and then call exec with it:

import types
code3 = types.CodeType(
    code.co_argcount,
    code.co_kwonlyargcount,
    code.co_nlocals,
    code.co_stacksize,
    code.co_flags,
    code.co_code[12:],
    code.co_consts,
    code.co_names,
    code.co_varnames,
    code.co_filename,
    code.co_name,
    code.co_firstlineno,
    code.co_lnotab,
    code.co_freevars,
    code.co_cellvars)
exec(code3)
print(eval('c'))

so that it correctly outputs the value of c as assigned:

True

However, if I attempt to extract and run just the byte codes for the second line, b = 'good stuff', which ranges from index 8 to 12 (not including 12):

code2 = types.CodeType(
    code.co_argcount,
    code.co_kwonlyargcount,
    code.co_nlocals,
    code.co_stacksize,
    code.co_flags,
    code.co_code[8:12],
    code.co_consts,
    code.co_names,
    code.co_varnames,
    code.co_filename,
    code.co_name,
    code.co_firstlineno,
    code.co_lnotab,
    code.co_freevars,
    code.co_cellvars)
exec(code2)
print(eval('b'))

it produces:

XXX lineno: 1, opcode: 0
Traceback (most recent call last):
  File "/path/file.py", line 21, in <module>
    exec(code2)
  File "", line 1, in <module>
SystemError: unknown opcode

Calling dis.dis(code2) would show that the new code object appears to contain the right byte codes for b = 'good stuff':

  1           0 LOAD_CONST               2 ('good stuff')
              2 STORE_NAME               1 (b)

So what am I missing?

like image 523
blhsing Avatar asked Dec 24 '22 00:12

blhsing


1 Answers

I'm answering my own question because I could find no documentation on this topic and it took me a while to figure out what I was missing so it might benefit others who happen to encounter the same issue.

It turns out that every code block is required to return a value--it isn't an option not to return a value. If there's no explicit return statement, then None would be implicitly returned, as evident by the last two byte codes shown in the question:

             16 LOAD_CONST               4 (None)
             18 RETURN_VALUE

So by slicing the byte codes from index 12 for the last line of c = True, I inadvertently included the trailing implicit return of None, luckily satisfying the requirement for the code block to return a value.

Such was not the case when I tried to slice the byte codes from index 8 to 12 for the second line of b = 'good stuff', as it left out the last two byte codes to return None, thereby causing the SystemError: unknown opcode exception.

So to fix this, all that was needed was to append the last two byte codes (for a total of 4 bytes actually, since byte codes have actually become "word" codes in Python 3) to the slice:

code2 = types.CodeType(
    code.co_argcount,
    code.co_kwonlyargcount,
    code.co_nlocals,
    code.co_stacksize,
    code.co_flags,
    code.co_code[8:12] + code.co_code[-4:],
    code.co_consts,
    code.co_names,
    code.co_varnames,
    code.co_filename,
    code.co_name,
    code.co_firstlineno,
    code.co_lnotab,
    code.co_freevars,
    code.co_cellvars)
exec(code2)
print(eval('b'))

This would then correctly output:

good stuff
like image 129
blhsing Avatar answered Jan 04 '23 23:01

blhsing