Substitutions are language elements that are recognized only within replacement patterns. They use a regular expression pattern to define all or part of the text that is to replace matched text in the input string. The replacement pattern can consist of one or more substitutions along with literal characters.
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .
Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .
No, when using the standard library re
module, regular expression patterns cannot be 'symbolized'.
You can always do so by re-using Python variables, of course:
digit_letter_letter_digit = r'\d\w\w\d'
then use string formatting to build the larger pattern:
match(r"{0},{0}".format(digit_letter_letter_digit), inputtext)
or, using Python 3.6+ f-strings:
dlld = r'\d\w\w\d'
match(fr"{dlld},{dlld}", inputtext)
I often do use this technique to compose larger, more complex patterns from re-usable sub-patterns.
If you are prepared to install an external library, then the regex
project can solve this problem with a regex subroutine call. The syntax (?<digit>)
re-uses the pattern of an already used (implicitly numbered) capturing group:
(\d\w\w\d),(?1)
^........^ ^..^
| \
| re-use pattern of capturing group 1
\
capturing group 1
You can do the same with named capturing groups, where (?<groupname>...)
is the named group groupname
, and (?&groupname)
, (?P&groupname)
or (?P>groupname)
re-use the pattern matched by groupname
(the latter two forms are alternatives for compatibility with other engines).
And finally, regex
supports the (?(DEFINE)...)
block to 'define' subroutine patterns without them actually matching anything at that stage. You can put multiple (..)
and (?<name>...)
capturing groups in that construct to then later refer to them in the actual pattern:
(?(DEFINE)(?<dlld>\d\w\w\d))(?&dlld),(?&dlld)
^...............^ ^......^ ^......^
| \ /
creates 'dlld' pattern uses 'dlld' pattern twice
Just to be explicit: the standard library re
module does not support subroutine patterns.
Note: this will work with PyPi regex module, not with re
module.
You could use the notation (?group-number)
, in your case:
(\d\w\w\d),(?1)
it is equivalent to:
(\d\w\w\d),(\d\w\w\d)
Be aware that \w
includes \d
. The regex will be:
(\d[a-zA-Z]{2}\d),(?1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With