Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: how to cut off sequences of more than 2 equal characters in a string

I'm looking for an efficient way to chance a string such that all sequences of more than 2 equal characters are cut off after the first 2.

Some input->output examples are:

hellooooooooo -> helloo
woooohhooooo -> woohhoo

I'm currently looping over the characters, but it's a bit slow. Does anyone have another solution (regexp or something else)

EDIT: current code:

word_new = ""
        for i in range(0,len(word)-2):    
            if not word[i] == word[i+1] == word[i+2]:
                word_new = word_new+word[i]
        for i in range(len(word)-2,len(word)):
            word_new = word_new + word[i]
like image 648
Bart Avatar asked Nov 25 '10 14:11

Bart


People also ask

How do you split a string into 4 characters Python?

Python split() method is used to split the string into chunks, and it accepts one argument called separator. A separator can be any character or a symbol. If no separators are defined, then it will split the given string and whitespace will be used by default.

How do I remove multiple characters from a string in Python?

Delete multiple characters from string using filter() and join() In Python, you can use the filter() function to filter all the occurences of a characters from a string.

How do I remove certain letters from a string in Python?

In Python you can use the replace() and translate() methods to specify which characters you want to remove from the string and return a new modified string result. It is important to remember that the original string will not be altered because strings are immutable.

How do you remove the last 3 characters of a string in Python?

In python, we can select characters in a string using negative indexing too. Last character in string has index -1 and it will keep on decreasing till we reach the start of the string. So to remove last 3 characters from a string select character from 0 i.e. to -3 i.e.


1 Answers

Edit: after applying helpful comments

import re

def ReplaceThreeOrMore(s):
    # pattern to look for three or more repetitions of any character, including
    # newlines.
    pattern = re.compile(r"(.)\1{2,}", re.DOTALL) 
    return pattern.sub(r"\1\1", s)

(original response here) Try something like this:

import re

# look for a character followed by at least one repetition of itself.
pattern = re.compile(r"(\w)\1+")

# a function to perform the substitution we need:
def repl(matchObj):
   char = matchObj.group(1)
   return "%s%s" % (char, char)

>>> pattern.sub(repl, "Foooooooooootball")
'Football'
like image 127
bgporter Avatar answered Nov 14 '22 22:11

bgporter