For writing an event-driven simulator, I'm relying on simpy, which heavily uses Python generators. I'm trying to understand how to make generators as fast as possible, i.e., minimize the state save/restore overhead. I tried three alternatives
and got the following results with Python 3.4.3:
class_generator 20.851247710175812
global_generator 12.802394330501556
local_generator 9.067587919533253
The code can be found here.
This feels counter-intuitive to me: Storing all state in the class instance means that only self
needs to be saved/restored, whereas storing all state globally should ensure zero save/restore overhead.
Does anybody know why are class generators and global generators are slower than local generators?
The generator actually retains the actual call frames when yield happens. It doesn't really affect performance whether you have 1 or 100 local variables.
The performance difference really comes from how Python (here I am using the CPython a.k.a. the one that you'd download from http://www.python.com/, or have on your operating system as /usr/bin/python
, but most implementations would have similar performance characteristics due to mostly the same reasons) behaves on different kinds of variable lookups:
Local variables are not actually named in Python; instead they're referred to by a number, and accessed by LOAD_FAST
opcode.
Global variables are accessed using LOAD_GLOBAL
opcode. They're referred to by name always, and thus every access needs an actual dictionary lookup.
The instance attribute accesses are the slowest, because self.foobar
first needs to use LOAD_FAST
to load a reference to self
, then the LOAD_ATTR
is used to find foobar
on the referred-to object, and this is a dictionary lookup. Also, if the attribute is on the instance itself this is going to be OK, but if it is set on the class, the attribute lookup is going to be slower. You're also setting values on the instance, it is going to be even slower, because now it needs to do STORE_ATTR
on the loaded instance. Further complicating is the fact that the class of the instance needs to be consulted as well - if the class happens to have a property descriptor by the same name, then it can alter the behaviour of reading and setting the attribute.
Thus the fastest generator is the one that only refers to local variables. It is a common idiom in Python code to store the value of global read-only variables into local variables to speed things up.
To demonstrate the differences, consider the code generated for these 3 variable accesses a
, b
and self.c
:
a = 42
class Foo(object):
def __init__(self):
self.c = 42
def foo(self):
b = 42
yield a
yield b
yield self.c
print(list(Foo().foo())) # prints [42, 42, 42]
The relevant part of the disassembly for foo
method is:
8 6 LOAD_GLOBAL 0 (a)
9 YIELD_VALUE
10 POP_TOP
9 11 LOAD_FAST 1 (b)
14 YIELD_VALUE
15 POP_TOP
10 16 LOAD_FAST 0 (self)
19 LOAD_ATTR 1 (c)
22 YIELD_VALUE
23 POP_TOP
The operands to LOAD_GLOBAL
and LOAD_ATTR
are references to the name a
and c
respectively; the numbers are indices on a table. The operand of LOAD_FAST
is the number of the local variable in the table of local variable tables.
The only state that a generator needs to save is a reference to the stack frame, so saving and restoring state takes exactly the same time no matter how much state is involved and wherever you put the data.
The difference you see in the timings is purely down to the speed with which Python can access values: a local variable access is very fast, a global variable access requires looking up the value in the global dictionary so is slower, and a class member access requires accessing the local variable 'self' and then performing at least one dictionary lookup on that value (also the call to the class generator has to be converted into a call with a single parameter which in itself is slower than the other calls which have no parameters).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With