Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I coalesce a sequence of identical characters into just one?

Suppose I have this:

My---sun--is------very-big---.

I want to replace all multiple hyphens with just one hyphen.

like image 548
TIMEX Avatar asked May 11 '10 19:05

TIMEX


3 Answers

import re

astr='My---sun--is------very-big---.'

print(re.sub('-+','-',astr))
# My-sun-is-very-big-.
like image 116
unutbu Avatar answered Sep 21 '22 11:09

unutbu


If you want to replace any run of consecutive characters, you can use

>>> import re
>>> a = "AA---BC++++DDDD-EE$$$$FF"
>>> print(re.sub(r"(.)\1+",r"\1",a))
A-BC+D-E$F

If you only want to coalesce non-word-characters, use

>>> print(re.sub(r"(\W)\1+",r"\1",a))
AA-BC+DDDD-EE$FF

If it's really just hyphens, I recommend unutbu's solution.

like image 34
Tim Pietzcker Avatar answered Sep 21 '22 11:09

Tim Pietzcker


If you really only want to coalesce hyphens, use the other suggestions. Otherwise you can write your own function, something like this:

>>> def coalesce(x):
...     n = []
...     for c in x:
...         if not n or c != n[-1]:
...             n.append(c)
...     return ''.join(n)
...
>>> coalesce('My---sun--is------very-big---.')
'My-sun-is-very-big-.'
>>> coalesce('aaabbbccc')
'abc'
like image 39
FogleBird Avatar answered Sep 24 '22 11:09

FogleBird