Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex in python: is it possible to get the match, replacement, and final string?

Tags:

python

regex

For doing a regex substitution, there are three things that you give it:

  • The match pattern
  • The replacement pattern
  • The original string

There are three things that the regex engine finds that are of interest to me:

  • The matched string
  • The replacement string
  • The final processed string

When using re.sub, the final string is what's returned. But is it possible to access the other two things, the matched string and replacement string?

Here's an example:

orig = "This is the original string." matchpat = "(orig.*?l)" replacepat = "not the \\1"  final = re.sub(matchpat, replacepat, orig) print(final) # This is the not the original string 

The match string is "original" and the replacement string is "not the original". Is there a way to get them? I'm writing a script to to search and replace in many files, and I want it to print it what it's finding and replacing, without printing out the entire line.

like image 666
wch Avatar asked Feb 03 '12 20:02

wch


People also ask

Can regex be used with replace in Python?

Regex can be used to perform various tasks in Python. It is used to do a search and replace operations, replace patterns in text, check if a string contains the specific pattern.

How do you replace all occurrences of a regex pattern in a string Python?

sub() method will replace all pattern occurrences in the target string. By setting the count=1 inside a re. sub() we can replace only the first occurrence of a pattern in the target string with another string. Set the count value to the number of replacements you want to perform.

How do I replace only part of a match with Python re sub?

Put a capture group around the part that you want to preserve, and then include a reference to that capture group within your replacement text. @Amber: I infer from your answer that unlike str. replace(), we can't use variables a) in raw strings; or b) as an argument to re. sub; or c) both.

What is the function to use a regular expression to find a string and then replace it with another string?

The replace() method returns a new string with one, some, or all matches of a pattern replaced by a replacement . The pattern can be a string or a RegExp , and the replacement can be a string or a function called for each match.


2 Answers

class Replacement(object):      def __init__(self, replacement):         self.replacement = replacement         self.matched = None         self.replaced = None      def __call__(self, match):         self.matched = match.group(0)         self.replaced = match.expand(self.replacement)         return self.replaced  >>> repl = Replacement('not the \\1') >>> re.sub('(orig.*?l)', repl, 'This is the original string.')     'This is the not the original string.' >>> repl.matched     'original' >>> repl.replaced     'not the original' 

Edit: as @F.J has pointed out, the above will remember only the last match/replacement. This version handles multiple occurrences:

class Replacement(object):      def __init__(self, replacement):         self.replacement = replacement         self.occurrences = []      def __call__(self, match):         matched = match.group(0)         replaced = match.expand(self.replacement)         self.occurrences.append((matched, replaced))         return replaced  >>> repl = Replacement('[\\1]') >>> re.sub('\s(\d)', repl, '1 2 3')     '1[2][3]'  >>> for matched, replaced in repl.occurrences:    ....:     print matched, '=>', replaced    ....:       2 => [2]  3 => [3] 
like image 77
Jakub Roztocil Avatar answered Sep 21 '22 15:09

Jakub Roztocil


I looked at the documentation and it seems like you can pass a function reference into the re.sub:

import re  def re_sub_verbose(pattern, replace, string):   def substitute(match):     print 'Matched:', match.group(0)     print 'Replacing with:', match.expand(replace)      return match.expand(replace)    result = re.sub(pattern, substitute, string)   print 'Final string:', result    return result 

And I get this output when running re_sub_verbose("(orig.*?l)", "not the \\1", "This is the original string."):

Matched: original Replacing with: not the original This is the not the original string. 
like image 32
Blender Avatar answered Sep 22 '22 15:09

Blender