What is the most efficient way to find consecutively repeating strings in a Python list?
For example, suppose I have the list
["a", "a", "b", "c", "b","b","b"]. I want an output of something like: ["group of 2 a's found at index 0, group of 3 b's found at index 4'].
Is there a built in function to accomplish this task? I did find numpy.bincount, but that seems to only work on numeric values.
Thanks in advance for the help.
It’s funny that you should call it a group, because the function probably best-suited to this is itertools.groupby:
>>> import itertools
>>> items = ["a", "a", "b", "c", "b", "b", "b"]
>>> [(k, sum(1 for _ in vs)) for k, vs in itertools.groupby(items)]
[('a', 2), ('b', 1), ('c', 1), ('b', 3)]
(sum(1 for _ in vs) is a count, by the way, since len doesn’t work on just any iterable, and len(list(…)) is wasteful.)
Getting the index is a little more complicated; I’d just do it using a loop.
import itertools
def group_with_index(l):
i = 0
for k, vs in itertools.groupby(l):
c = sum(1 for _ in vs)
yield (k, c, i)
i += c
This requires state information between elements of the loop so its not easy to do with a list comprehension. Instead you can keep track of last value in a loop:
groups = []
for i, val in enumerate(["a", "a", "b", "c", "b","b","b"]):
if i == 0:
cnt = 1
loc = i
last_val = val
elif val == last_val:
cnt += 1
else:
groups.append((cnt, last_val, loc))
cnt = 1
loc = i
last_val = val
for group in groups:
print("group of {0} {1}'s found at index {2}".format(*group)
Output:
group of 2 a's found at index 0
group of 1 b's found at index 2
group of 1 c's found at index 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With