I'm trying to build a regular expression that matches regular expressions between two forward slashes. My main problem is that regular expressions themselves can contain forward slashes, escaped by a backslash. I try to filter them out with a negative lookbehind assertion (only match the closing slash if there is no backlash at the current position), however, now I'm having the problem that I don't get the match, if the regex itself actually ends with a escaped backslash.
test program:
#!/usr/bin/python
import re
teststrings=[
"""/hello world/""",
"""/string with foreslash here \/ and here\//""",
"""/this one ends with backlash\\\\/"""]
patt="""^\/(?P<pattern>.*)(?<!\\\\)\/$"""
for t in teststrings:
m=re.match(patt,t)
if m!=None:
print t,' => MATCH'
else:
print t," => NO MATCH"
output:
/hello world/ => MATCH
/string with foreslash here \/ and here\// => MATCH
/this one ends with backlash\\/ => NO MATCH
How would I modify the assertion to only hit if there is a single backlash at the current position, but not two?
Or is there a better way to extract the regex? (Note, in the actual file I try to parse the lines contain more than just the regex. I can't simply search for the first and last slash per line and get everything inbetween.)
Try this:
pattern = re.compile(r"^/(?:\\.|[^/\\])*/")
Explanation:
^ # Start of string
/ # Match /
(?: # Match either...
\\. # an escaped character
| # or
[^/\\] # any character except slash/backslash
)* # any number of times.
/ # Match /
For your "real-world" application (finding the first "slash-delimited string", disregarding escaped slashes), I'd use
pattern = re.compile(r"^(?:\\.|[^/\\])*/((?:\\.|[^/\\])*)/")
This gets you the following:
>>> pattern.match("foo /bar/ baz").group(1)
'bar'
>>> pattern.match("foo /bar\/bam/ baz").group(1)
'bar\\/bam'
>>> pattern.match("foo /bar/bam/ baz").group(1)
'bar'
>>> pattern.match("foo\/oof /bar\/bam/ baz").group(1)
'bar\\/bam'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With