I have a sentence as a list of words, and I'm trying to extract all the bigrams (i.e. all the consecutive 2-tuples of words) from it. So, if my sentence was
['To', 'sleep', 'perchance', 'to', 'dream']
I want to get back out
[('To', 'sleep'), ('sleep', 'perchance'), ('perchance', 'to'), ('to', 'dream')]
Currently, I'm using
zip([sentence[i] for i in range(len(sentence) - 1)], [sentence[i+1] for i in range(len(sentence) - 1)]
and then iterating over this, but I can't help thinking there are more Pythonic ways of doing this.
You're on the right track with zip
. I suggest using list slicing instead of comprehensions.
seq = ['To', 'sleep', 'perchance', 'to', 'dream']
print zip(seq, seq[1:])
Result:
[('To', 'sleep'), ('sleep', 'perchance'), ('perchance', 'to'), ('to', 'dream')]
Note that the arguments to zip
don't have to be the same length, so it's fine that seq
is longer than seq[1:]
.
Here's one I prepared earlier. It's from the itertools recipes section in the official python docs.
from itertools import tee
def pairwise(iterable):
"""Iterate in pairs
>>> list(pairwise([0, 1, 2, 3]))
[(0, 1), (1, 2), (2, 3)]
>>> tuple(pairwise([])) == tuple(pairwise('x')) == ()
True
"""
a, b = tee(iterable)
next(b, None)
return zip(a, b)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With