Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string into strings of repeating elements

Tags:

python

I want to split a string like:

'aaabbccccabbb'

into

['aaa', 'bb', 'cccc', 'a', 'bbb']

What's an elegant way to do this in Python? If it makes it easier, it can be assumed that the string will only contain a's, b's and c's.

like image 980
Colin Avatar asked Feb 29 '12 19:02

Colin


4 Answers

That is the use case for itertools.groupby :)

>>> from itertools import groupby
>>> s = 'aaabbccccabbb'
>>> [''.join(y) for _,y in groupby(s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']
like image 172
Niklas B. Avatar answered Nov 02 '22 17:11

Niklas B.


You can create an iterator - without trying to be smart just to keep it short and unreadable:

def yield_same(string):
    it_str = iter(string)
    result = it_str.next()
    for next_chr in it_str:
        if next_chr != result[0]:
            yield result
            result = ""
        result += next_chr
    yield result


.. 
>>> list(yield_same("aaaaaabcbcdcdccccccdddddd"))
['aaaaaa', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'cccccc', 'dddddd']
>>> 

edit ok, so there is itertools.groupby, which probably does something like this.

like image 45
jsbueno Avatar answered Nov 02 '22 18:11

jsbueno


Here's the best way I could find using regex:

print [a for a,b in re.findall(r"((\w)\2*)", s)]
like image 34
Jacob Eggers Avatar answered Nov 02 '22 18:11

Jacob Eggers


>>> import re
>>> s = 'aaabbccccabbb'
>>> [m.group() for m in re.finditer(r'(\w)(\1*)',s)]
['aaa', 'bb', 'cccc', 'a', 'bbb']
like image 30
jamylak Avatar answered Nov 02 '22 17:11

jamylak