Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression: Match string between two slashes if the string itself contains escaped slashes

Tags:

python

regex

I'm trying to build a regular expression that matches regular expressions between two forward slashes. My main problem is that regular expressions themselves can contain forward slashes, escaped by a backslash. I try to filter them out with a negative lookbehind assertion (only match the closing slash if there is no backlash at the current position), however, now I'm having the problem that I don't get the match, if the regex itself actually ends with a escaped backslash.

test program:

#!/usr/bin/python
import re
teststrings=[
     """/hello world/""", 
     """/string with foreslash here \/ and here\//""",
     """/this one ends with backlash\\\\/"""]

patt="""^\/(?P<pattern>.*)(?<!\\\\)\/$"""

for t in teststrings:
    m=re.match(patt,t)
    if m!=None:
        print t,' => MATCH'
    else:
        print t," => NO MATCH"

output:

/hello world/  => MATCH
/string with foreslash here \/ and here\//  => MATCH
/this one ends with backlash\\/  => NO MATCH

How would I modify the assertion to only hit if there is a single backlash at the current position, but not two?

Or is there a better way to extract the regex? (Note, in the actual file I try to parse the lines contain more than just the regex. I can't simply search for the first and last slash per line and get everything inbetween.)

like image 474
Gryphius Avatar asked Dec 12 '11 11:12

Gryphius


1 Answers

Try this:

pattern = re.compile(r"^/(?:\\.|[^/\\])*/")

Explanation:

^       # Start of string
/       # Match /
(?:     # Match either...
 \\.    # an escaped character
|       # or
 [^/\\] # any character except slash/backslash
)*      # any number of times.
/       # Match /

For your "real-world" application (finding the first "slash-delimited string", disregarding escaped slashes), I'd use

pattern = re.compile(r"^(?:\\.|[^/\\])*/((?:\\.|[^/\\])*)/")

This gets you the following:

>>> pattern.match("foo /bar/ baz").group(1)
'bar'
>>> pattern.match("foo /bar\/bam/ baz").group(1)
'bar\\/bam'
>>> pattern.match("foo /bar/bam/ baz").group(1)
'bar'
>>> pattern.match("foo\/oof /bar\/bam/ baz").group(1)
'bar\\/bam'
like image 78
Tim Pietzcker Avatar answered Nov 05 '22 01:11

Tim Pietzcker