I'm trying to understand the performance of a generator function. I've used cProfile and the pstats module to collect and inspect profiling data. The function in question is this:
def __iter__(self):
delimiter = None
inData = self.inData
lenData = len(inData)
cursor = 0
while cursor < lenData:
if delimiter:
mo = self.stringEnd[delimiter].search(inData[cursor:])
else:
mo = self.patt.match(inData[cursor:])
if mo:
mo_lastgroup = mo.lastgroup
mstart = cursor
mend = mo.end()
cursor += mend
delimiter = (yield (mo_lastgroup, mo.group(mo_lastgroup), mstart, mend))
else:
raise SyntaxError("Unable to tokenize text starting with: \"%s\"" % inData[cursor:cursor+200])
self.inData
is a unicode text string, self.stringEnd
is a dict with 4 simple regex's, self.patt is one big regex. The whole thing is to split the big string into smaller strings, one-by-one.
Profiling a program that uses it I found that the biggest part of the program's run time is spent in this function:
In [800]: st.print_stats("Scanner.py:124")
463263 function calls (448688 primitive calls) in 13.091 CPU seconds
Ordered by: cumulative time
List reduced from 231 to 1 due to restriction <'Scanner.py:124'>
ncalls tottime percall cumtime percall filename:lineno(function)
10835 11.465 0.001 11.534 0.001 Scanner.py:124(__iter__)
But looking at the profile of the function itself, there is not much time spent in the sub-calls of functions:
In [799]: st.print_callees("Scanner.py:124")
Ordered by: cumulative time
List reduced from 231 to 1 due to restriction <'Scanner.py:124'>
Function called...
ncalls tottime cumtime
Scanner.py:124(__iter__) -> 10834 0.006 0.006 {built-in method end}
10834 0.009 0.009 {built-in method group}
8028 0.030 0.030 {built-in method match}
2806 0.025 0.025 {built-in method search}
1 0.000 0.000 {len}
The rest of the function is not much besides while, assignments and if-else. Even the send
method on the generator which I use is fast:
ncalls tottime percall cumtime percall filename:lineno(function)
13643/10835 0.007 0.000 11.552 0.001 {method 'send' of 'generator' objects}
Is it possible that the yield
, passing a value back to the consumer, is taking the majority of the time?! Anything else that I'm not aware of?
EDIT:
I probably should have mentioned that the generator function __iter__
is a method of a small class, so self
refers to an instance of this class.
This is actually the answer of Dunes, who unfortunately only gave it as a comment and doesn't seem to be inclined to put it in a proper answer.
The main performance culprit were the string slices. Some timing measurements showed that slicing performance degrades perceivably with big slices (meaning taking a big slice from an already big string). To work around that I now use the pos
parameter for the regex object methods:
if delimiter:
mo = self.stringEnd[delimiter].search(inData, pos=cursor)
else:
mo = self.patt.match(inData, pos=cursor)
Thanks to all who helped.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With