Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine compiled Python regexes

Tags:

python

regex

Is there any mechanism in Python for combining compiled regular expressions?

I know it's possible to compile a new expression by extracting the plain-old-string .pattern property from existing pattern objects. But this fails in several ways. For example:

import re

first = re.compile(r"(hello?\s*)")

# one-two-three or one/two/three - but not one-two/three or one/two-three
second = re.compile(r"one(?P<r1>[-/])two(?P=r1)three", re.IGNORECASE)

# Incorrect - back-reference \1 would refer to the wrong capturing group now,
# and we get an error "redefinition of group name 'r1' as group 3; was 
# group 2 at position 47" for the `(?P)` group.
# Result is also now case-sensitive, unlike 'second' which is IGNORECASE
both = re.compile(first.pattern + second.pattern + second.pattern)

The result I'm looking for is achievable like so in Perl:

$first = qr{(hello?\s*)};

# one-two-three or one/two/three - but not one-two/three or one/two-three
$second = qr{one([-/])two\g{-1}three}i;

$both = qr{$first$second$second};

A test shows the results:

test($second, "...one-two-three...");                   # Matches
test($both, "...hello one-two-THREEone-two-three...");  # Matches
test($both, "...hellone/Two/ThreeONE-TWO-THREE...");    # Matches
test($both, "...HELLO one/Two/ThreeONE-TWO-THREE...");  # No match

sub test {
  my ($pat, $str) = @_;
  print $str =~ $pat ? "Matches\n" : "No match\n";
}

Is there a library somewhere that makes this use case possible in Python? Or a built-in feature I'm missing somewhere?

(Note - one very useful feature in the Perl regex above is \g{-1}, which unambiguously refers to the immediately preceding capture group, so that there are no collisions of the type that Python is complaining about when I try to compile the combined expression. I haven't seen that anywhere in Python world, not sure if there's an alternative I haven't thought of.)

like image 483
Ken Williams Avatar asked Feb 23 '18 22:02

Ken Williams


1 Answers

Ken, this is an interesting problem. I agree with you that the Perl solution is very slick. I came up with something, but it is not so elegant. Maybe it gives you some idea to further explore the solution using Python. The idea is to simulate the concatenation using Python re methods.

first = re.compile(r"(hello?\s*)")
second = re.compile(r"one(?P<r1>[-/])two(?P=r1)three", re.IGNORECASE)

str="...hello one-two-THREEone/two/three..."
#str="...hellone/Two/ThreeONE-TWO-THREE..."
if re.search(first,str):
    first_end_pos = re.search(first,str).end()
    if re.match(second,str[first_end_pos:]):
        second_end_pos = re.match(second,str[first_end_pos:]).end() + first_end_pos
        if re.match(second,str[second_end_pos:]):
            print ('Matches')

It will work for most of the cases but it is not working for the below case:

...hellone/Two/ThreeONE-TWO-THREE...

So, yes I admit it is not a complete solution to your problem. Hope this helps though.

like image 62
Pulkit Kansal Avatar answered Oct 31 '22 06:10

Pulkit Kansal