The command
re.compile(ur"(?<=,| |^)(?:next to|near|beside|opp).+?(?=,|$)", re.IGNORECASE)
throws a
sre_constants.error: look-behind requires fixed-width pattern
error in my program but regex101 shows it to be fine.
What I'm trying to do here is to match landmarks from addresses (each address is in a separate string) like:
The lookbehind is to avoid matching words with opp
in them (like in string 3).
Why is that error thrown? Is there an alternative to what I'm looking for?
Regex Lookbehind is used as an assertion in Python regular expressions(re) to determine success or failure whether the pattern is behind i.e to the right of the parser's current position. They don't match anything. Hence, Regex Lookbehind and lookahead are termed as a zero-width assertion.
The lookbehind asserts that what immediately precedes the current position is a lowercase letter. And the lookahead asserts that what immediately follows the current position is an uppercase letter.
Positive and Negative Lookbehind Lookbehind has the same effect, but works backwards. It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (?<!a)b matches a “b” that is not preceded by an “a”, using negative lookbehind.
Lookarounds are zero width assertions. They check for a regex (towards right or left of the current position - based on ahead or behind), succeeds or fails when a match is found (based on if it is positive or negative) and discards the matched portion.
re.compile(ur"(?:^|(?<=[, ]))(?:next to|near|beside|opp).+?(?=,|$)", re.IGNORECASE)
You can club 3
conditions using []
and |
.See demo.
https://regex101.com/r/vA8cB3/2#python
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With