Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python, regular expressions, named groups and "logical or" operator

Tags:

python

regex

In python regular expression, named and unnamed groups are both defined with '(' and ')'. This leads to a weird behavior. Regexp

"(?P<a>1)=(?P<b>2)"

used with text "1=2" will find named group "a" with value "1" and named group "b" with value "2". But if i want to use "logical or" operator and concatenate multiple rules, the following regexp:

"((?P<a>1)=(?P<b>2))|(?P<c>3)"

used with same text "1=2" will find an unnamed group with value "1=2". I understood that regexp engine treats "(" and ")" that encloses groups "a" and "b" as an unnamed group and reports that it is found. But i don't want an unnamed groups to be reported, i just want to use "|" in order to "glue" multiple regexps together. Without creating any parasitic unnamed groups. Is it a way to do so in python?

like image 880
grigoryvp Avatar asked May 24 '09 11:05

grigoryvp


1 Answers

Use (?:) to get rid of the unnamed group:

r"(?:(?P<a>1)=(?P<b>2))|(?P<c>3)"

From the documentation of re:

(?:...) A non-grouping version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.

By the way, the alternation operator | has very low precedence in order to make parentheses unnecessary in cases like yours. You can drop the extra parentheses in your regex and it will continue to work as expected:

r"(?P<a>1)=(?P<b>2)|(?P<c>3)"
like image 102
Ayman Hourieh Avatar answered Sep 28 '22 06:09

Ayman Hourieh