I need to find, process and remove (one by one) any substrings that match a rather long regex:
# p is a compiled regex
# s is a string
while 1:
m = p.match(s)
if m is None:
break
process(m.group(0)) #do something with the matched pattern
s = re.sub(m.group(0), '', s) #remove it from string s
The code above is not good for 2 reasons:
It doesn't work if m.group(0) happens to contain any regex-special characters (like *, +, etc.).
It feels like I'm duplicating the work: first I search the string for the regular expression, and then I have to kinda go look for it again to remove it.
What's a good way to do this?
Regex can be used to perform various tasks in Python. It is used to do a search and replace operations, replace patterns in text, check if a string contains the specific pattern.
sub() method will replace all pattern occurrences in the target string. By setting the count=1 inside a re. sub() we can replace only the first occurrence of a pattern in the target string with another string. Set the count value to the number of replacements you want to perform.
If you want to replace a string that matches a regular expression (regex) instead of perfect match, use the sub() of the re module. In re. sub() , specify a regex pattern in the first argument, a new string in the second, and a string to be processed in the third.
To perform a substitution, you use the Replace method of the Regex class, instead of the Match method that we've seen in earlier articles. This method is similar to Match, except that it includes an extra string parameter to receive the replacement value.
The re.sub function can take a function as an argument so you can combine the replacement and processing steps if you wish:
# p is a compiled regex
# s is a string
def process_match(m):
# Process the match here.
return ''
s = p.sub(process_match, s)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With