Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex fails when ? involved in repeating group

Tags:

python

regex

Trying to match any number of comma separated 7 character strings that can include digits, _ and ?.

x = re.compile(r"^([0-9_\?]{7})(,\1)*$")

>>> x.match("123456?")
<_sre.SRE_Match object at 0x0046C800>
>>> x.match("12345??")
<_sre.SRE_Match object at 0x023483C8>
>>> x.match("1234???")
<_sre.SRE_Match object at 0x0046C800>
>>> x.match("123????")
<_sre.SRE_Match object at 0x023483C8>
>>> x.match("12?????")
<_sre.SRE_Match object at 0x0046C800>
>>> x.match("1??????")
<_sre.SRE_Match object at 0x023483C8>
>>> x.match("???????")
<_sre.SRE_Match object at 0x0046C800>
>>> x.match("???????,1234567")
>>>

^^^^^^^^^^^^^^^^^^^^^^This is where it fails

vvvvvvvvvvvvvvvvvvvvvvBut repetition works if I don't have a ? in the string

>>> x.match("1234567,1234567")
<_sre.SRE_Match object at 0x023483C8>

I've also tried it with:

x = re.compile(r"^([0-9_\\?]{7})(,\1)*$")

But that just allows it to match the \ character (as expected).

What is wrong with my regex?

like image 593
boatcoder Avatar asked Mar 02 '26 15:03

boatcoder


1 Answers

\1 is a backreference that will match what the referenced group matched, not what it can match. If you want to allow that pattern to appear twice, just write it twice:

r"^([0-9_?]{7})(,[0-9_?]{7})*$"

(Also note that ? doesn’t need escaping inside a character set.)

like image 148
Ry- Avatar answered Mar 05 '26 04:03

Ry-