I want to split a string like:
'aaabbccccabbb'
into
['aaa', 'bb', 'cccc', 'a', 'bbb']
What's an elegant way to do this in Python? If it makes it easier, it can be assumed that the string will only contain a's, b's and c's.
That is the use case for itertools.groupby
:)
>>> from itertools import groupby
>>> s = 'aaabbccccabbb'
>>> [''.join(y) for _,y in groupby(s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']
You can create an iterator - without trying to be smart just to keep it short and unreadable:
def yield_same(string):
it_str = iter(string)
result = it_str.next()
for next_chr in it_str:
if next_chr != result[0]:
yield result
result = ""
result += next_chr
yield result
..
>>> list(yield_same("aaaaaabcbcdcdccccccdddddd"))
['aaaaaa', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'cccccc', 'dddddd']
>>>
edit ok, so there is itertools.groupby, which probably does something like this.
Here's the best way I could find using regex:
print [a for a,b in re.findall(r"((\w)\2*)", s)]
>>> import re
>>> s = 'aaabbccccabbb'
>>> [m.group() for m in re.finditer(r'(\w)(\1*)',s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With