Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bytecode optimization

Tags:

python

Here are 2 simple examples. In the first example append method produces LOAD_ATTR instruction inside the cycle, in the second it only produced once and result saved in variable (ie cached). Reminder: I remember, that there extend method for this task which is much faster that this

setup = \
"""LIST = []
ANOTHER_LIST = [i for i in range(10**7)]

def appender(list, another_list):
    for elem in another_list:
        list.append(elem)

def appender_optimized(list, another_list):
    append_method = list.append
    for elem in another_list:
        append_method(elem)"""


import timeit

print(timeit.timeit("appender(LIST, ANOTHER_LIST)", setup=setup, number=10))
print(timeit.timeit("appender_optimized(LIST, ANOTHER_LIST)", setup=setup, number=10))

Results:

11.92684596051036
7.384205785584728

4.6 seconds difference (even for such a big list) is no joke - for my opinion such difference can not be counted as "micro optimization". Why Python does not do it for me? Because bytecode must be exact reflection of source code? Do compiler even optimize anything? For example,

def te():
    a = 2
    a += 1
    a += 1
    a += 1
    a += 1

produces

LOAD_FAST                0 (a)
LOAD_CONST               2 (1)
INPLACE_ADD
STORE_FAST               0 (a)

4 times instead of optimize into a += 4. Or do it optimize some famous things like producing bit shift instead of multiplying by 2? Am I misunderstand something about basic language concepts?

like image 650
m9_psy Avatar asked Jun 09 '16 23:06

m9_psy


People also ask

Does Python optimize bytecode?

As the Python source code goes through the compilation step, certain things such as numeric expressions, strings & tuples get optimized and stored in bytecode instruction.

What bytecode means?

Byte code is the intermediate code compiled and executed by a virtual machine (VM). Byte code can be used unchanged on any platform on which the VM operates.

What is bytecode example?

Different types of bytecode use different syntax, which can be read and executed by the corresponding virtual machine. A popular example is Java bytecode, which is compiled from Java source code and can be run on a Java Virtual Machine (JVM). Below are examples of Java bytecode instructions.

How does JIT optimization work?

To help the JIT compiler analyze the method, its bytecodes are first reformulated in an internal representation called trees, which resembles machine code more closely than bytecodes. Analysis and optimizations are then performed on the trees of the method. At the end, the trees are translated into native code.


2 Answers

Python is a dynamic language. This means that you have a lot of freedom in how you write code. Due to the crazy amounts of introspection that python exposes (which are incredibly useful BTW), many optimizations simply cannot be performed. For example, in your first example, python has no way of knowing what datatype list is going to be when you call it. I could create a really weird class:

class CrazyList(object):
    def append(self, value):
        def new_append(value):
            print "Hello world"

        self.append = new_append

Obviously this isn't useful, but I can write this and it is valid python. If I were to pass this type to your above function, the code would be different than the version where you "cache" the append function.

We could write a similar example for += (it could have side-effects that wouldn't get executed if the "compiler" optimized it away).

In order to optimize efficiently, python would have to know your types ... And for a vast majority of your code, it has no (fool-proof) way to get the type data so it doesn't even try for most optimizations.


Please note that this is a micro optimization (and a well documented one). It is useful in some cases, but in most cases it is unnecessary if you write idiomatic python. e.g. your list example is best written using the .extend method as you've noted in your post. Most of the time, if you have a loop that is tight enough for the lookup time of a method to matter in your overall program runtime, then either you should find a way to rewrite just that loop to be more efficient or even push the computation into a faster language (e.g. C). Some libraries are really good at this (numpy).


With that said, there are some optimizations that can be done safely by the "compiler" in a stage known as the "peephole optimizer". e.g. It will do some simple constant folding for you:

>>> import dis
>>> def foo():
...     a = 5 * 6
... 
>>> dis.dis(foo)
  2           0 LOAD_CONST               3 (30)
              3 STORE_FAST               0 (a)
              6 LOAD_CONST               0 (None)
              9 RETURN_VALUE        

In some cases, it'll cache values for later use, or turn one type of object into another:

>>> def translate_tuple(a):
...   return a in [1, 3]
... 
>>> import dis
>>> dis.dis(translate_tuple)
  2           0 LOAD_FAST                0 (a)
              3 LOAD_CONST               3 ((1, 3))
              6 COMPARE_OP               6 (in)
              9 RETURN_VALUE

(Note the list got turned into a tuple and cached -- In python3.2+ set literals can also get turned into frozenset and cached).

like image 82
mgilson Avatar answered Oct 23 '22 01:10

mgilson


In general Python optimises virtually nothing. It won't even optimise trivial things like x = x. Python is so dynamic that doing so correctly would be extremely hard. For example the list.append method can't be automatically cached in your first example because it could be changed in another thread, something which can't be done in a more static language like Java.

like image 22
Alex Hall Avatar answered Oct 23 '22 01:10

Alex Hall