Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make Python generators as fast as possible?

For writing an event-driven simulator, I'm relying on simpy, which heavily uses Python generators. I'm trying to understand how to make generators as fast as possible, i.e., minimize the state save/restore overhead. I tried three alternatives

  1. All state stored in a class instance
  2. All state stored globally
  3. All state stored locally

and got the following results with Python 3.4.3:

class_generator 20.851247710175812
global_generator 12.802394330501556
local_generator 9.067587919533253

The code can be found here.

This feels counter-intuitive to me: Storing all state in the class instance means that only self needs to be saved/restored, whereas storing all state globally should ensure zero save/restore overhead.

Does anybody know why are class generators and global generators are slower than local generators?

like image 394
user1202136 Avatar asked Dec 05 '22 00:12

user1202136


2 Answers

The generator actually retains the actual call frames when yield happens. It doesn't really affect performance whether you have 1 or 100 local variables.

The performance difference really comes from how Python (here I am using the CPython a.k.a. the one that you'd download from http://www.python.com/, or have on your operating system as /usr/bin/python, but most implementations would have similar performance characteristics due to mostly the same reasons) behaves on different kinds of variable lookups:

  • Local variables are not actually named in Python; instead they're referred to by a number, and accessed by LOAD_FAST opcode.

  • Global variables are accessed using LOAD_GLOBAL opcode. They're referred to by name always, and thus every access needs an actual dictionary lookup.

  • The instance attribute accesses are the slowest, because self.foobar first needs to use LOAD_FAST to load a reference to self, then the LOAD_ATTR is used to find foobar on the referred-to object, and this is a dictionary lookup. Also, if the attribute is on the instance itself this is going to be OK, but if it is set on the class, the attribute lookup is going to be slower. You're also setting values on the instance, it is going to be even slower, because now it needs to do STORE_ATTR on the loaded instance. Further complicating is the fact that the class of the instance needs to be consulted as well - if the class happens to have a property descriptor by the same name, then it can alter the behaviour of reading and setting the attribute.

Thus the fastest generator is the one that only refers to local variables. It is a common idiom in Python code to store the value of global read-only variables into local variables to speed things up.

To demonstrate the differences, consider the code generated for these 3 variable accesses a, b and self.c:

a = 42

class Foo(object):
    def __init__(self):
        self.c = 42

    def foo(self):
        b = 42
        yield a
        yield b
        yield self.c

print(list(Foo().foo()))   # prints [42, 42, 42]

The relevant part of the disassembly for foo method is:

  8           6 LOAD_GLOBAL              0 (a)
              9 YIELD_VALUE
             10 POP_TOP

  9          11 LOAD_FAST                1 (b)
             14 YIELD_VALUE
             15 POP_TOP

 10          16 LOAD_FAST                0 (self)
             19 LOAD_ATTR                1 (c)
             22 YIELD_VALUE
             23 POP_TOP

The operands to LOAD_GLOBAL and LOAD_ATTR are references to the name a and c respectively; the numbers are indices on a table. The operand of LOAD_FAST is the number of the local variable in the table of local variable tables.


The only state that a generator needs to save is a reference to the stack frame, so saving and restoring state takes exactly the same time no matter how much state is involved and wherever you put the data.

The difference you see in the timings is purely down to the speed with which Python can access values: a local variable access is very fast, a global variable access requires looking up the value in the global dictionary so is slower, and a class member access requires accessing the local variable 'self' and then performing at least one dictionary lookup on that value (also the call to the class generator has to be converted into a call with a single parameter which in itself is slower than the other calls which have no parameters).

like image 31
Duncan Avatar answered Dec 06 '22 13:12

Duncan