Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex findall alternation behavior

I'm using Python 2.7.6. I can't understand the following result from re.findall:

>>> re.findall('\d|\(\d,\d\)', '(6,7)')
['(6,7)']

I expected the above to return ['6', '7'], because according to the documentation:

'|'

A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. An arbitrary number of REs can be separated by the '|' in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by '|' are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the '|' operator is never greedy. To match a literal '|', use \|, or enclose it inside a character class, as in [|].

Thanks for your help

like image 482
Mayank Avatar asked Sep 27 '22 07:09

Mayank


1 Answers

As mentioned in document :

This means that once A matches, B will not be tested further, even if it would produce a longer overall match.

So in this case regex engine doesn't match the \d because your string stars with ( and not \d so it will match the second case that is \(\d,\d\). But if your string stared with \d it would match \d :

>>> re.findall('\d|\d,\d\)', '6,7)')
['6', '7']
like image 137
Mazdak Avatar answered Oct 26 '22 07:10

Mazdak