Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does using an attribute instead of a method provide such a significant boost in Python speed

I've been experimenting with a class that does pattern matching. My class looks something like this:

class Matcher(object):
  def __init__(self, pattern):
    self._re = re.compile(pattern)

  def match(self, value):
    return self._re.match(value)

All told, my script takes ~45 seconds to run. As an experiment, I changed my code to:

class Matcher(object):
  def __init__(self, pattern):
    self._re = re.compile(pattern)
    self.match = self._re.match

A run of this script took 37 seconds. No matter how many times I repeat this process, I see the same significant boost in performance. Running it through cProfile shows something like this:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 46100979   14.356    0.000   14.356    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
 44839409    9.287    0.000   20.031    0.000 matcher.py:266(match)

Why on earth is the match method adding 9.2 seconds onto the run time? The most frustrating part is that I tried to recreate a simple case and was not able to do so. What am I missing here? My simple test case had a bug! Now it mimics the behavior I am seeing:

import re
import sys
import time

class X(object):
  def __init__(self):
    self._re = re.compile('.*a')

  def match(self, value):
    return self._re.match(value)

class Y(object):
  def __init__(self):
    self._re = re.compile('ba')
    self.match = self._re.match

inp = 'ba'
x = X()
y = Y()

sys.stdout.write("Testing with a method...")
sys.stdout.flush()
start = time.time()
for i in range(100000000):
  m = x.match(inp)
end = time.time()
sys.stdout.write("Done: "+str(end-start)+"\n")

sys.stdout.write("Testing with an attribute...")
sys.stdout.flush()
start = time.time()
for i in range(100000000):
  m = y.match(inp)
end = time.time()
sys.stdout.write("Done: "+str(end-start)+"\n")

Output:

$ python speedtest.py 
Testing with a method...Done: 50.6646981239
Testing with an attribute...Done: 35.5526258945

For reference, both are much faster with pyp, but still show significant gains when running with an atribute instead of a method:

$ pypy speedtest.py 
Testing with a method...Done: 6.15996003151
Testing with an attribute...Done: 3.57215714455
like image 428
dave mankoff Avatar asked Sep 07 '12 19:09

dave mankoff


1 Answers

It's probably mostly the overhead of the additional function call. Calling a Python function is relatively expensive performance wise, because of the need to set up an additional stack frame, etc. Here is a bare-bones example that demonstrates similar performance:

>>> timeit.timeit("f()", "g = (lambda: 1); f = lambda: g()")
0.2858083918486847
>>> timeit.timeit("f()", "f = lambda: 1")
0.13749289364989004

There is also the additional cost of doing two extra attribute lookups inside your method: looking up _re on self, then looking up match on that _re object. However, this is likely a smaller component since dictionary lookups are pretty fast in Python. (My timeit example above shows pretty poor performance even when there is only one extra name lookup in the double-call version.)

like image 82
BrenBarn Avatar answered Nov 14 '22 21:11

BrenBarn