The documentation for itertools
provides a recipe for a pairwise()
function, which I've slightly modified below so that it returns (last_item, None)
as the final pair:
from itertools import tee, izip_longest
def pairwise_tee(iterable):
a, b = tee(iterable)
next(b, None)
return izip_longest(a, b)
However, it seemed to me that using tee()
might be overkill (given that it's only being used to provide one step of look-ahead), so I tried writing an alternative that avoids it:
def pairwise_zed(iterator):
a = next(iterator)
for b in iterator:
yield a, b
a = b
yield a, None
Note: it so happens that I know my input will be an iterator for my use case; I'm aware that the function above won't work with a regular iterable. The requirement to accept an iterator is also why I'm not using something like izip_longest(iterable, iterable[1:])
, by the way.
Testing both functions for speed gave the following results in Python 2.7.3:
>>> import random, string, timeit
>>> for length in range(0, 61, 10):
... text = "".join(random.choice(string.ascii_letters) for n in range(length))
... for variant in "tee", "zed":
... test_case = "list(pairwise_%s(iter('%s')))" % (variant, text)
... setup = "from __main__ import pairwise_%s" % variant
... result = timeit.repeat(test_case, setup=setup, number=100000)
... print "%2d %s %r" % (length, variant, result)
... print
...
0 tee [0.4337780475616455, 0.42563915252685547, 0.42760396003723145]
0 zed [0.21209311485290527, 0.21059393882751465, 0.21039700508117676]
10 tee [0.4933490753173828, 0.4958930015563965, 0.4938509464263916]
10 zed [0.32074403762817383, 0.32239794731140137, 0.32340312004089355]
20 tee [0.6139161586761475, 0.6109561920166016, 0.6153261661529541]
20 zed [0.49281787872314453, 0.49651598930358887, 0.4942781925201416]
30 tee [0.7470319271087646, 0.7446520328521729, 0.7463529109954834]
30 zed [0.7085139751434326, 0.7165200710296631, 0.7171430587768555]
40 tee [0.8083810806274414, 0.8031280040740967, 0.8049719333648682]
40 zed [0.8273730278015137, 0.8248250484466553, 0.8298079967498779]
50 tee [0.8745720386505127, 0.9205660820007324, 0.878741979598999]
50 zed [0.9760301113128662, 0.9776301383972168, 0.978381872177124]
60 tee [0.9913749694824219, 0.9922418594360352, 0.9938201904296875]
60 zed [1.1071209907531738, 1.1063809394836426, 1.1069209575653076]
... so, it turns out that pairwise_tee()
starts to outperform pairwise_zed()
when there are about forty items. That's fine, as far as I'm concerned - on average, my input is likely to be under that threshold.
My question is: which should I use? pairwise_zed()
looks like it'll be a little faster (and to my eyes is slightly easier to follow), but pairwise_tee()
could be considered the "canonical" implementation by virtue of being taken from the official docs (to which I could link in a comment), and will work for any iterable - which isn't a consideration at this point, but I suppose could be later.
I was also wondering about potential gotchas if the iterator is interfered with outside the function, e.g.
for a, b in pairwise(iterator):
# do something
q = next(iterator)
... but as far as I can tell, pairwise_zed()
and pairwise_tee()
behave identically in that situation (and of course it would be a damn fool thing to do in the first place).
The itertools tee
implementation is idiomatic for those experienced with itertools, though I'd be tempted to use islice
instead of next
to advance the leading iterator.
A disadvantage of your version is that it's less easy to extend it to n-wise iteration as your state is stored in local variables; I'd be tempted to use a deque:
def pairwise_deque(iterator, n=2):
it = chain(iterator, repeat(None, n - 1))
d = collections.deque(islice(it, n - 1), maxlen=n)
for a in it:
d.append(a)
yield tuple(d)
A useful idiom is calling iter
on the iterator
parameter; this is an easy way to ensure your function works on any iterable.
This is a subjective question; both versions are fine.
I would use tee
, because it looks simpler to me: I know what tee
does, so the first is immediately obvious, whereas with the second I have to think a little about the order in which you overwrite a
at the end of each loop. The timings are small enough as to be probably irrelephant, but you're the judge of that.
Regarding your other question, from the tee
docs:
Once
tee()
has made a split, the original iterable should not be used anywhere else; otherwise, the iterable could get advanced without the tee objects being informed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With