For doing a regex substitution, there are three things that you give it:
There are three things that the regex engine finds that are of interest to me:
When using re.sub
, the final string is what's returned. But is it possible to access the other two things, the matched string and replacement string?
Here's an example:
orig = "This is the original string." matchpat = "(orig.*?l)" replacepat = "not the \\1" final = re.sub(matchpat, replacepat, orig) print(final) # This is the not the original string
The match string is "original"
and the replacement string is "not the original"
. Is there a way to get them? I'm writing a script to to search and replace in many files, and I want it to print it what it's finding and replacing, without printing out the entire line.
Regex can be used to perform various tasks in Python. It is used to do a search and replace operations, replace patterns in text, check if a string contains the specific pattern.
sub() method will replace all pattern occurrences in the target string. By setting the count=1 inside a re. sub() we can replace only the first occurrence of a pattern in the target string with another string. Set the count value to the number of replacements you want to perform.
Put a capture group around the part that you want to preserve, and then include a reference to that capture group within your replacement text. @Amber: I infer from your answer that unlike str. replace(), we can't use variables a) in raw strings; or b) as an argument to re. sub; or c) both.
The replace() method returns a new string with one, some, or all matches of a pattern replaced by a replacement . The pattern can be a string or a RegExp , and the replacement can be a string or a function called for each match.
class Replacement(object): def __init__(self, replacement): self.replacement = replacement self.matched = None self.replaced = None def __call__(self, match): self.matched = match.group(0) self.replaced = match.expand(self.replacement) return self.replaced >>> repl = Replacement('not the \\1') >>> re.sub('(orig.*?l)', repl, 'This is the original string.') 'This is the not the original string.' >>> repl.matched 'original' >>> repl.replaced 'not the original'
Edit: as @F.J has pointed out, the above will remember only the last match/replacement. This version handles multiple occurrences:
class Replacement(object): def __init__(self, replacement): self.replacement = replacement self.occurrences = [] def __call__(self, match): matched = match.group(0) replaced = match.expand(self.replacement) self.occurrences.append((matched, replaced)) return replaced >>> repl = Replacement('[\\1]') >>> re.sub('\s(\d)', repl, '1 2 3') '1[2][3]' >>> for matched, replaced in repl.occurrences: ....: print matched, '=>', replaced ....: 2 => [2] 3 => [3]
I looked at the documentation and it seems like you can pass a function reference into the re.sub
:
import re def re_sub_verbose(pattern, replace, string): def substitute(match): print 'Matched:', match.group(0) print 'Replacing with:', match.expand(replace) return match.expand(replace) result = re.sub(pattern, substitute, string) print 'Final string:', result return result
And I get this output when running re_sub_verbose("(orig.*?l)", "not the \\1", "This is the original string.")
:
Matched: original Replacing with: not the original This is the not the original string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With