Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace repeated instances of a character with a single instance of that character in python

Tags:

I want to replace repeated instances of the "*" character within a string with a single instance of "*". For example if the string is "***abc**de*fg******h", I want it to get converted to "*abc*de*fg*h".

I'm pretty new to python (and programming in general) and tried to use regular expressions and string.replace() like:

import re     pattern = "***abc**de*fg******h" pattern.replace("*"\*, "*") 

where \* is supposed to replace all instances of the "*" character. But I got: SyntaxError: unexpected character after line continuation character.

I also tried to manipulate it with a for loop like:

def convertString(pattern): for i in range(len(pattern)-1):     if(pattern[i] == pattern[i+1]):         pattern2 = pattern[i] return pattern2 

but this has the error where it only prints "*" because pattern2 = pattern[i] constantly redefines what pattern2 is...

Any help would be appreciated.

like image 830
NSchrading Avatar asked Oct 07 '10 03:10

NSchrading


People also ask

How do you replace a repeated character in a string in Python?

replace() to Replace Multiple Characters in Python. We can use the replace() method of the str data type to replace substrings into a different output.

How do you replace a specific character in Python?

Using 'str.replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace() method will replace all occurrences of the specific character mentioned.

How do you replace multiple items in Python?

Method 3: Replace multiple characters using re.subn() is similar to sub() in all ways, except in its way of providing output. It returns a tuple with a count of the total of replacement and the new string rather than just the string.


1 Answers

The naive way to do this kind of thing with re is

re.sub('\*+', '*', text) 

That replaces runs of 1 or more asterisks with one asterisk. For runs of exactly one asterisk, that is running very hard just to stay still. Much better is to replace runs of TWO or more asterisks by a single asterisk:

re.sub('\*\*+', '*', text) 

This can be well worth doing:

\python27\python -mtimeit -s"t='a*'*100;import re" "re.sub('\*+', '*', t)" 10000 loops, best of 3: 73.2 usec per loop  \python27\python -mtimeit -s"t='a*'*100;import re" "re.sub('\*\*+', '*', t)" 100000 loops, best of 3: 8.9 usec per loop 

Note that re.sub will return a reference to the input string if it has found no matches, saving more wear and tear on your computer, instead of a whole new string.

like image 64
John Machin Avatar answered Oct 12 '22 23:10

John Machin