I was fiddling around with Python's generators and iterable class, just for fun. Basically I wanted test out something that I've never been too sure about: that classes in Pythons have some significant overhead and it's better to rely on methods that implement yield
instead of classes that implement an iterator protocol, if you can.
I couldn't find a satisfying explanation on this topic in Google, so I decided to test them out on my own using these two simple scripts: func_iter.py
and class_iter.py
Here's func_iter.py
:
#!/usr/bin/env python
import time
x = 0
def create_generator(num):
mylist = range(num)
for i in mylist:
yield i
t = time.time()
gen = create_generator(100000)
for i in gen:
x = x + i
print "%.3f" % (time.time() - t)
And here's class_iter.py
:
#!/usr/bin/env python
import time
x = 0
class Generator(object):
def __init__(self, num):
self.start = 0
self.end = num
def __iter__(self):
return self
def next(self):
if self.start == self.end:
raise StopIteration
else:
self.start = self.start + 1
return self.start
t = time.time()
gen = Generator(100000)
for i in gen:
x = x + i
print "%.3f" % (time.time() - t)
I then ran each of them 10 times using this in bash (for class_iter.py
, for example):
for i in {1..10}; do ./class_iter.py; done
And here are the average running times for each of them:
class_iter.py: 0.0864
func_iter.py: 0.0307
Now, my questions are:
class_iter.py
take almost three times as long as func_iter.py
to run?EDIT: As Dacav suggested, I also tried running func_iter.py
using xrange
instead of range
. This decreases its average running time to 0.0263 seconds.
The class version spends lots of time accessing its own variables. Each self.whatever
costs cycles. If you define your __iter__
as a generator and minimize the use of instance variables, the difference between class and function versions will be negligible:
setup = """
def create_generator(num):
mylist = range(num)
for i in mylist:
yield i
class Generator(object):
def __init__(self, num):
self.start = 0
self.end = num
def __iter__(self):
return self
def next(self):
if self.start == self.end:
raise StopIteration
else:
self.start = self.start + 1
return self.start
class Generator2(object):
def __init__(self, num):
self.mylist = range(num)
def __iter__(self):
for i in self.mylist:
yield i
"""
import timeit
print timeit.timeit('for p in create_generator(1000):p', setup, number=1000)
print timeit.timeit('for p in Generator(1000):p', setup, number=1000)
print timeit.timeit('for p in Generator2(1000):p', setup, number=1000)
Results:
0.158941984177
0.696810007095
0.160784959793
so the second generator class is almost as fast as the function version.
Please do note that Generator
and Generator2
in the example are not fully equivalent, there are cases when you cannot simply replace a "plain" iterator with a generator (e.g. marshaling).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With