The pattern (?<!(asp|php|jsp))\?.*
works in PCRE, but it doesn't work in Python.
So what can I do to get this regex working in Python? (Python 2.7)
The (? <! \$) is a negative lookbehind that does not match the $ sign. The \d+ matches a number with one or more digits.
In negative lookbehind the regex engine first finds a match for an item after that it traces back and tries to match a given item which is just before the main match. In case of a successful traceback match the match is a failure, otherwise it is a success.
Lookbehind has the same effect, but works backwards. It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (? <!a)b matches a “b” that is not preceded by an “a”, using negative lookbehind.
Lookahead is used as an assertion in Python regular expressions to determine success or failure whether the pattern is ahead i.e to the right of the parser's current position. They don't match anything. Hence, they are called as zero-width assertions.
It works perfectly fine for me. Are you maybe using it wrong? Make sure to use re.search
instead of re.match
:
>>> import re
>>> s = 'somestring.asp?1=123'
>>> re.search(r"(?<!(asp|php|jsp))\?.*", s)
>>> s = 'somestring.xml?1=123'
>>> re.search(r"(?<!(asp|php|jsp))\?.*", s)
<_sre.SRE_Match object at 0x0000000002DCB098>
Which is exactly how your pattern should behave. As glglgl mentioned, you can get the match if you assign that Match
object to a variable (say m
) and then call m.group()
. That yields ?1=123
.
By the way, you can leave out the inner parentheses. This pattern is equivalent:
(?<!asp|php|jsp)\?.*
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With