I am attempting to using part of a regular expression as input for a later part of the regular expression.
What I have so far (which fails the assertions):
import re
regex = re.compile(r"(?P<length>\d+)(\d){(?P=length)}")
assert bool(regex.match("3123")) is True
assert bool(regex.match("100123456789")) is True
Breaking this down, the first digit(s) signify how many digits should be matched afterward. In the first assertion, I get a 3
, as the first character, which means there should be exactly three digits after, otherwise there are more than 9 digits. If there are more than 9 digits, then the first group will need to be expanded and checked against the rest of the digits.
The regular expression 3(\d){3}
would properly match the first assertion, however I cannot get the regular expression to match the general case where the braces {}
are fed a regular expression backreference: {(?P=length)}
Calling the regular expression with the re.DEBUG
flag I get:
subpattern 1
max_repeat 1 4294967295
in
category category_digit
subpattern 2
in
category category_digit
literal 123
groupref 1
literal 125
It looks like the braces {
(123
) and }
(125
) are being interpreted as literals when there is a backreference inside of them.
When there is no backreference, such as {3}
, I can see that {3}
is being interpreted as max_repeat 3 3
Is using a backreference as part of a regular expression possible?
A backreference in a regular expression identifies a previously matched group and looks for exactly the same text again. A simple example of the use of backreferences is when you wish to look for adjacent, repeated words in some text. The first part of the match could use a pattern that extracts a single word.
The syntax is \N or \g<N> where N is the capture group you want.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
There is no way to put the backreference as a limiting quantifier argument inside the pattern. To solve your current task, I can suggest the following code (see inline comments explaining the logic):
import re
def checkNum(s):
first = ''
if s == '0':
return True # Edge case, 0 is valid input
m = re.match(r'\d{2,}$', s) # The string must be all digits, at least 2
if m:
lim = ''
for i, n in enumerate(s):
lim = lim + n
if re.match(r'{0}\d{{{0}}}$'.format(lim), s):
return True
elif int(s[0:i+1]) > len(s[i+1:]):
return False
print(checkNum('3123')) # Meets the pattern (123 is 3 digit chunk after 3)
print(checkNum('1234567')) # Does not meet the pattern, just 7 digits
print(checkNum('100123456789')) # Meets the condition, 10 is followed with 10 digits
print(checkNum('9123456789')) # Meets the condition, 9 is followed with 9 digits
See the Python demo online.
The pattern used with re.match
(that anchors the pattern at the start of the string) is {0}\d{{{0}}}$
that will look like 3\d{3}$
in case the 3123
is passed to the checkNum
method. It will match a string that starts with 3
, and then will match exactly 3 digits followed with end of string marker ($
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With