How to write regex in Python with (?(DEFINE))?

Question

I would like to parse codetags in source files. I wrote this regex that works fine with PCRE:

(?<tag>(?&TAG)):\s*
(?<message>.*?)
(
<
   (?<author>(?:\w{3}\s*,\s*)*\w{3})?\s*
   (?<date>(?&DATE))?
   (?<flags>(?&FLAGS))?
>
)?
$

(?(DEFINE)
   (?<TAG>\b(NOTE|LEGACY|HACK|TODO|FIXME|XXX|BUG))
   (?<DATE>\d{4}-\d{2}-\d{2})
   (?<FLAGS>[pts]:\w+\b)
)

Unfortunately it seems Python doesn't understand the DEFINE (https://regex101.com/r/qH1uG3/1#pcre)

What is the best workaround in Python?

Casimir et Hippolyte · Accepted Answer

The way with the regex module:

As explained in comments the regex module allows to reuse named subpatterns. Unfortunately there is no (?(DEFINE)...) syntax like in Perl or PCRE.

So the way is to use the same workaround than with Ruby language that consists to put a {0} quantifier when you want to define a named subpattern:

import regex

s = r'''
// NOTE: A small example
// HACK: Another example <ABC 2014-02-03>
// HACK: Another example <ABC,DEF 2014-02-03>
// HACK: Another example <ABC,DEF p:0>
'''

p = r'''
    # subpattern definitions
    (?<TAG> \b(?:NOTE|LEGACY|HACK|TODO|FIXME|XXX|BUG) ){0}
    (?<DATE> \d{4}-\d{2}-\d{2} ){0}
    (?<FLAGS> [pts]:\w+ ){0}

    # main pattern
    (?<tag> (?&TAG) ) : \s*
    (?<message> (?>[^\s<]+[^
\S]+)* [^\s<]+ )? \s* # to trim the message
    <
    (?<author> (?: \w{3} \s* , \s* )*+ \w{3} )? \s*
    (?<date> (?&DATE) )?
    (?<flags> (?&FLAGS) )?
    >
    $
'''

rgx = regex.compile(p, regex.VERBOSE | regex.MULTILINE)

for m in rgx.finditer(s):
    print (m.group('tag'))

Note: the subpatterns can be defined at the end of the pattern too.

How to write regex in Python with (?(DEFINE))?

Tags:

python

regex

nowox

1 Answers

Casimir et Hippolyte

Recent Activity

Donate For Us

How to write regex in Python with (?(DEFINE))?

Tags:

python

regex

nowox

1 Answers

Casimir et Hippolyte

Related questions

Recent Activity

Donate For Us