Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

lookbehind for start of string or a character

The command

re.compile(ur"(?<=,| |^)(?:next to|near|beside|opp).+?(?=,|$)", re.IGNORECASE)

throws a

sre_constants.error: look-behind requires fixed-width pattern

error in my program but regex101 shows it to be fine.

What I'm trying to do here is to match landmarks from addresses (each address is in a separate string) like:

  • "Opp foobar, foocity" --> Must match "Opp foobar"
  • "Fooplace, near barplace, barcity" --> Must match "near barplace"
  • "Fooplace, Shoppers Stop, foocity"--> Must match nothing
  • "Fooplace, opp barplace"--> Must match "opp barplace"

The lookbehind is to avoid matching words with opp in them (like in string 3).

Why is that error thrown? Is there an alternative to what I'm looking for?

like image 301
anupamGak Avatar asked Jun 15 '15 08:06

anupamGak


People also ask

What is Lookbehind in regex?

Regex Lookbehind is used as an assertion in Python regular expressions(re) to determine success or failure whether the pattern is behind i.e to the right of the parser's current position. They don't match anything. Hence, Regex Lookbehind and lookahead are termed as a zero-width assertion.

What is lookahead and Lookbehind?

The lookbehind asserts that what immediately precedes the current position is a lowercase letter. And the lookahead asserts that what immediately follows the current position is an uppercase letter.

Can I use negative Lookbehind?

Positive and Negative Lookbehind Lookbehind has the same effect, but works backwards. It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (?<!a)b matches a “b” that is not preceded by an “a”, using negative lookbehind.

What is regex lookaround?

Lookarounds are zero width assertions. They check for a regex (towards right or left of the current position - based on ahead or behind), succeeds or fails when a match is found (based on if it is positive or negative) and discards the matched portion.


1 Answers

re.compile(ur"(?:^|(?<=[, ]))(?:next to|near|beside|opp).+?(?=,|$)", re.IGNORECASE)

You can club 3 conditions using [] and |.See demo.

https://regex101.com/r/vA8cB3/2#python

like image 182
vks Avatar answered Oct 19 '22 14:10

vks