Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is [0] a different function but 0 isn't?

I've inspected .__code__ objects for two functions I deemed different, but found to be identical, for a variety of expressions. If code objects are identical, as far as I understand, they compile to same bytecode, and are thus "same" functions.

Table below is of things inserted before ; pass that makes g have different __code__. Since f is a "do nothing" function, this suggests everything under "same" to never execute, including the long arithmetic. Further, a tuple is "same", but a list and a string are "diff" - so we might conclude that unassigned expressions involving immutable literals aren't evaluated. But then there's 1/0, which might be an "exception" due to raising an Exception - then what of 10**99 vs. 10**9? 10**99 doesn't raise an Exception and can be assigned.

I couldn't tell much from profiling; both "same" and "diff" had indistinguishable execution times. When they could be distinguished, however, it was always with "diff".

If "same"s never execute, then how does Python determine what to or not to execute? If they do execute, how are their code objects same?


Same:

  • 0, (0,), True, False, None
  • 10 ** 9
  • ()
  • -314159.265358 ** (1/12345) / 2.718281828 + 500 - 7j

Diff:

  • [0], {0: 0}
  • 10 ** 99
  • [], {}, ""

Comparison code:

def compare(fn1, fn2):
    for name in dir(fn1.__code__):
        if (name.startswith("co_") and
            name not in ("co_filename", "co_name", "co_firstlineno")):
            v1 = getattr(fn1.__code__, name)
            v2 = getattr(fn2.__code__, name)
            if v1 == v2:
                print(name.ljust(18), "same")
            else:
                print(name.ljust(18), "diff", v1, v2)

def f():
    pass

def g():
    10 ** 99; pass

The following differ: co_name (always), co_filename (IPython), co_firstlineno (from file) - but don't affect what's "executed", correct me if wrong; from docs, co_code is what should differ.


Note: accepted answer misses an important piece of intuition: unassigned literals code may be kept if code required to store the value takes more memory than code required to store the expression to compute the value; that's the case with 10 ** 99 (at least, that's what was asserted in the comments). See comments below the answer for further info.

like image 792
OverLordGoldDragon Avatar asked Oct 16 '22 02:10

OverLordGoldDragon


1 Answers

All literals of the "diff" group are either not constants ([], {}) or not beneficial for optimisation (e.g. 10 ** 99 is smaller than its value). All expressions of the "same" group evaluate to constants which can be discarded. Inspecting the bytecode shows that the expressions are removed completely:

>>> # CPython 3.7.4
>>> def g(): 10/1; pass
>>> dis.dis(g)
1           0 LOAD_CONST               0 (None)
            2 RETURN_VALUE

Notably, none of the removed expressions change the observable behaviour. Whether a Python implementation removes unobservable behaviour or not is purely an implementation detail. Expressions with side-effects, such as 1/0, are not removed.

>>> # CPython 3.7.4
>>> def g(): 10/0; pass
>>> dis.dis(g)
1           0 LOAD_CONST               1 (10)
            2 LOAD_CONST               2 (0)
            4 BINARY_TRUE_DIVIDE
            6 POP_TOP
            8 LOAD_CONST               0 (None)
           10 RETURN_VALUE

For the shown expressions, the bytecode is the same on CPython 3.7.4, CPython 3.8.2, PyPy 3.6.9 [PyPy 7.3.0].

On CPython 3.4.3, CPython 2.7.10, PyPy 2.7.13 [PyPy 7.1.1] the constant expression 10/1 is evaluated but not discarded.

>>> # CPython 3.4.3
>>> def g(): 10/1; pass
>>> dis.dis(g)
1           0 LOAD_CONST               3 (10.0)                                                                                  
            3 POP_TOP                                                                                                            
            4 LOAD_CONST               0 (None)                                                                                  
            7 RETURN_VALUE

The expression "" is discarded in any Python implementation available to me.


As these optimisations are implementation details, there is no formal specification. If a deeper understanding is desired, the implementation itself should be consulted. For CPython, a good starting point is the peephole optimiser source code.

To keep the optimizer simple, it bails when the lineno table has complex encoding for gaps >= 255.

Optimizations are restricted to simple transformations occurring within a single basic block. All transformations keep the code size the same or smaller. For those that reduce size, the gaps are initially filled with NOPs. Later those NOPs are removed and the jump addresses retargeted in a single pass.

like image 175
MisterMiyagi Avatar answered Oct 18 '22 10:10

MisterMiyagi