Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write regex in Python with (?(DEFINE))?

Tags:

python

regex

I would like to parse codetags in source files. I wrote this regex that works fine with PCRE:

(?<tag>(?&TAG)):\s*
(?<message>.*?)
(
<
   (?<author>(?:\w{3}\s*,\s*)*\w{3})?\s*
   (?<date>(?&DATE))?
   (?<flags>(?&FLAGS))?
>
)?
$

(?(DEFINE)
   (?<TAG>\b(NOTE|LEGACY|HACK|TODO|FIXME|XXX|BUG))
   (?<DATE>\d{4}-\d{2}-\d{2})
   (?<FLAGS>[pts]:\w+\b)
)

Unfortunately it seems Python doesn't understand the DEFINE (https://regex101.com/r/qH1uG3/1#pcre)

What is the best workaround in Python?

like image 970
nowox Avatar asked Mar 18 '23 01:03

nowox


1 Answers

The way with the regex module:

As explained in comments the regex module allows to reuse named subpatterns. Unfortunately there is no (?(DEFINE)...) syntax like in Perl or PCRE.

So the way is to use the same workaround than with Ruby language that consists to put a {0} quantifier when you want to define a named subpattern:

import regex

s = r'''
// NOTE: A small example
// HACK: Another example <ABC 2014-02-03>
// HACK: Another example <ABC,DEF 2014-02-03>
// HACK: Another example <ABC,DEF p:0>
'''

p = r'''
    # subpattern definitions
    (?<TAG> \b(?:NOTE|LEGACY|HACK|TODO|FIXME|XXX|BUG) ){0}
    (?<DATE> \d{4}-\d{2}-\d{2} ){0}
    (?<FLAGS> [pts]:\w+ ){0}

    # main pattern
    (?<tag> (?&TAG) ) : \s*
    (?<message> (?>[^\s<]+[^\n\S]+)* [^\s<]+ )? \s* # to trim the message
    <
    (?<author> (?: \w{3} \s* , \s* )*+ \w{3} )? \s*
    (?<date> (?&DATE) )?
    (?<flags> (?&FLAGS) )?
    >
    $
'''

rgx = regex.compile(p, regex.VERBOSE | regex.MULTILINE)

for m in rgx.finditer(s):
    print (m.group('tag'))

Note: the subpatterns can be defined at the end of the pattern too.

like image 119
Casimir et Hippolyte Avatar answered Mar 19 '23 14:03

Casimir et Hippolyte