Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overhead on looping over an iterable class

I was fiddling around with Python's generators and iterable class, just for fun. Basically I wanted test out something that I've never been too sure about: that classes in Pythons have some significant overhead and it's better to rely on methods that implement yield instead of classes that implement an iterator protocol, if you can.

I couldn't find a satisfying explanation on this topic in Google, so I decided to test them out on my own using these two simple scripts: func_iter.py and class_iter.py

Here's func_iter.py:

#!/usr/bin/env python

import time  

x = 0
def create_generator(num):
    mylist = range(num)
    for i in mylist:
        yield i

t = time.time()
gen = create_generator(100000)

for i in gen:
    x = x + i

print "%.3f" % (time.time() - t)

And here's class_iter.py:

#!/usr/bin/env python

import time

x = 0

class Generator(object):

    def __init__(self, num):
        self.start = 0
        self.end = num

    def __iter__(self):
        return self

    def next(self):
        if self.start == self.end:
            raise StopIteration
        else:
            self.start = self.start + 1
            return self.start

t = time.time()
gen = Generator(100000)

for i in gen:
    x = x + i

print "%.3f" % (time.time() - t)

I then ran each of them 10 times using this in bash (for class_iter.py, for example):

for i in {1..10}; do ./class_iter.py; done

And here are the average running times for each of them:

class_iter.py: 0.0864
func_iter.py: 0.0307

Now, my questions are:

  1. Are my methods correct? Is my comparison fair?
  2. If so, why the big difference? Why did class_iter.py take almost three times as long as func_iter.py to run?
  3. If not, how can I improve my methods or come up with a better comparison?

EDIT: As Dacav suggested, I also tried running func_iter.py using xrange instead of range. This decreases its average running time to 0.0263 seconds.

like image 802
bow Avatar asked May 12 '12 16:05

bow


1 Answers

The class version spends lots of time accessing its own variables. Each self.whatever costs cycles. If you define your __iter__ as a generator and minimize the use of instance variables, the difference between class and function versions will be negligible:

setup = """
def create_generator(num):
    mylist = range(num)
    for i in mylist:
        yield i

class Generator(object):

    def __init__(self, num):
        self.start = 0
        self.end = num

    def __iter__(self):
        return self

    def next(self):
        if self.start == self.end:
            raise StopIteration
        else:
            self.start = self.start + 1
            return self.start

class Generator2(object):

    def __init__(self, num):
        self.mylist = range(num)

    def __iter__(self):
        for i in self.mylist:
            yield i
"""

import timeit

print timeit.timeit('for p in create_generator(1000):p', setup, number=1000)
print timeit.timeit('for p in Generator(1000):p', setup, number=1000)
print timeit.timeit('for p in Generator2(1000):p', setup, number=1000)

Results:

0.158941984177
0.696810007095
0.160784959793

so the second generator class is almost as fast as the function version.

Please do note that Generator and Generator2 in the example are not fully equivalent, there are cases when you cannot simply replace a "plain" iterator with a generator (e.g. marshaling).

like image 179
georg Avatar answered Sep 20 '22 02:09

georg