Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python re.sub() is replacing the full match even when using non-capturing groups [duplicate]

Tags:

python

regex

I believe that re.sub() is replacing the Full Match, but in this case I only want to replace the matching groups and ignore the non-capturing groups. How can I go about this?

string = 'aBCDeFGH'

print(re.sub('(a)?(?:[A-Z]{3})(e)?(?:[A-Z]{3})', '+', string))

output is :

+

Expected output is:

+BCD+FGH
like image 483
Darwin Avatar asked Mar 28 '18 07:03

Darwin


People also ask

How do I replace only part of a match with Python re sub?

Put a capture group around the part that you want to preserve, and then include a reference to that capture group within your replacement text. @Amber: I infer from your answer that unlike str. replace(), we can't use variables a) in raw strings; or b) as an argument to re. sub; or c) both.

What does re sub does in Python?

sub() function belongs to the Regular Expressions ( re ) module in Python. It returns a string where all matching occurrences of the specified pattern are replaced by the replace string. To use this function, we need to import the re module first.

What is non-capturing group in regex?

Non-capturing groups are important constructs within Java Regular Expressions. They create a sub-pattern that functions as a single unit but does not save the matched character sequence. In this tutorial, we'll explore how to use non-capturing groups in Java Regular Expressions.

How does regex replace work?

The REGEXREPLACE( ) function uses a regular expression to find matching patterns in data, and replaces any matching values with a new string. standardizes spacing in character data by replacing one or more spaces between text characters with a single space.


1 Answers

The general solution for such problems is using a lambda in the replacement:

string = 'aBCDeFGH'

print(re.sub('(a)?([A-Z]{3})(e)?([A-Z]{3})', lambda match: '+%s+%s' % (match.group(2), match.group(4)), string))

However, as bro-grammer has commented, you can use backreferences in this case:

print(re.sub('(a)?([A-Z]{3})(e)?([A-Z]{3})', r'+\2+\4', string))
like image 95
pts Avatar answered Nov 12 '22 17:11

pts