Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling backreferences to capturing groups in re.sub replacement pattern

Tags:

python

regex

I want to take the string 0.71331, 52.25378 and return 0.71331,52.25378 - i.e. just look for a digit, a comma, a space and a digit, and strip out the space.

This is my current code:

coords = '0.71331, 52.25378' coord_re = re.sub("(\d), (\d)", "\1,\2", coords) print coord_re 

But this gives me 0.7133,2.25378. What am I doing wrong?

like image 495
Richard Avatar asked Nov 16 '11 19:11

Richard


People also ask

How do you replace groups in Python?

sub() method will replace all pattern occurrences in the target string. By setting the count=1 inside a re. sub() we can replace only the first occurrence of a pattern in the target string with another string. Set the count value to the number of replacements you want to perform.

What is non capturing group in regex?

Non-capturing groups are important constructs within Java Regular Expressions. They create a sub-pattern that functions as a single unit but does not save the matched character sequence. In this tutorial, we'll explore how to use non-capturing groups in Java Regular Expressions.

What is a capturing group regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .

What does re sub () do?

re. sub() function is used to replace occurrences of a particular sub-string with another sub-string. This function takes as input the following: The sub-string to replace.


2 Answers

You should be using raw strings for regex, try the following:

coord_re = re.sub(r"(\d), (\d)", r"\1,\2", coords) 

With your current code, the backslashes in your replacement string are escaping the digits, so you are replacing all matches the equivalent of chr(1) + "," + chr(2):

>>> '\1,\2' '\x01,\x02' >>> print '\1,\2' , >>> print r'\1,\2'   # this is what you actually want \1,\2 

Any time you want to leave the backslash in the string, use the r prefix, or escape each backslash (\\1,\\2).

like image 53
Andrew Clark Avatar answered Oct 21 '22 18:10

Andrew Clark


Python interprets the \1 as a character with ASCII value 1, and passes that to sub.

Use raw strings, in which Python doesn't interpret the \.

coord_re = re.sub(r"(\d), (\d)", r"\1,\2", coords) 

This is covered right in the beginning of the re documentation, should you need more info.

like image 32
Petr Viktorin Avatar answered Oct 21 '22 17:10

Petr Viktorin