Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Python's dis dislike lists?

In Python (2.7.2),why does

import dis
dis.dis("i in (2, 3)")

works as expected whereas

import dis
dis.dis("i in [2, 3]")

raises:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dis.py", line 45, in dis
  disassemble_string(x)
File "/usr/lib/python2.7/dis.py", line 112, in disassemble_string
  labels = findlabels(code)
File "/usr/lib/python2.7/dis.py", line 166, in findlabels
 oparg = ord(code[i]) + ord(code[i+1])*256
IndexError: string index out of range

Note that this doesn't affect Python3.

like image 515
Inkane Avatar asked May 06 '12 19:05

Inkane


3 Answers

Short Answer

In Python 2.x, the str type holds raw bytes, so dis assumes that if you pass it a string it is getting compiled bytecode. It tries to disassemble the string you pass it as bytecode and -- purely due to the implementation details of Python bytecode -- succeeds for i in (2,3). Obviously, though, it returns gibberish.

In Python 3.x, the str type is for strings and the bytes types is for raw bytes, so dis can distinguish between compiled bytecode and strings -- and assumes it is getting source code if it gets a string.


Long Answer

Here's the thought process I followed to work this one out.

  1. I tried it on my Python (3.2):

    >>> import dis
    >>> dis.dis("i in (2,3)")  
      1           0 LOAD_NAME                0 (i)
                  3 LOAD_CONST               2 ((2, 3))
                  6 COMPARE_OP               6 (in)
                  9 RETURN_VALUE
    >>> dis.dis("i in [2,3]")
      1           0 LOAD_NAME                0 (i)
                  3 LOAD_CONST               2 ((2, 3))
                  6 COMPARE_OP               6 (in)
                  9 RETURN_VALUE
    

    Obviously, this works.

  2. I tried it on Python 2.7:

    >>> import dis
    >>> dis.dis("i in (2,3)")
              0 BUILD_MAP       26912
              3 JUMP_FORWARD    10272 (to 10278)
              6 DELETE_SLICE+0
              7 <44>
              8 DELETE_SLICE+1
              9 STORE_SLICE+1
    >>> dis.dis("i in [2,3]")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Python27\lib\dis.py", line 45, in dis
        disassemble_string(x)
      File "C:\Python27\lib\dis.py", line 112, in disassemble_string
        labels = findlabels(code)
      File "C:\Python27\lib\dis.py", line 166, in findlabels
        oparg = ord(code[i]) + ord(code[i+1])*256
    IndexError: string index out of range
    

    Aha! Notice also that the generated bytecode in Python 3.2 is what you would expect ("load i, load (2,3), test for membership, return the result") whereas what you have got in Python 2.7 is gibberish. Clearly, dis is decompiling the string as bytecode in 2.7 but compiling it as Python in 3.2.

  3. I had a look in the source code for dis.dis. Here are the key points:

    Python 2.7:

    elif isinstance(x, str):
        disassemble_string(x)
    

    Python 3.2:

       elif isinstance(x, (bytes, bytearray)): # Raw bytecode
           _disassemble_bytes(x)
       elif isinstance(x, str):    # Source code
           _disassemble_str(x)
    

    Just for fun, let's check this by passing the same bytes to dis in Python 3:

    >>> dis.dis("i in (2,3)".encode())
              0 BUILD_MAP       26912
              3 JUMP_FORWARD    10272 (to 10278)
              6 <50>
              7 <44>
              8 <51>
              9 <41>
    

    Aha! Gibberish! (Though note that it's slightly different gibberish -- the bytecode has changed with the Python version!)

like image 78
Katriel Avatar answered Oct 19 '22 07:10

Katriel


dis.dis expects bytecode as an argument, not python source code. Although your first example "works", it doesn't provide any meaningful output. You probably want:

import compiler, dis

code = compiler.compile("i in [2, 3]", '', 'single')
dis.dis(code)

This works as expected. (I tested in 2.7 only).

like image 36
georg Avatar answered Oct 19 '22 08:10

georg


If you are just trying to get bytecode for a simple expression, passing it to dis as a lambda with your expression as the lambda's body is the simplest:

>>> import dis
>>> dis.dis(lambda i : i in [3,2])
  1           0 LOAD_FAST                0 (i)
              3 LOAD_CONST               2 ((3, 2))
              6 COMPARE_OP               6 (in)
              9 RETURN_VALUE
like image 45
PaulMcG Avatar answered Oct 19 '22 09:10

PaulMcG