I've read all related posts and scoured the internet but this is really beating me.
I have some text containing a date.
I would like to capture the date, but not if it's preceded by a certain phrase.
A straightforward solution is to add a negative lookbehind to my RegEx.
Here are some examples (using findall).
I only want to capture the date if it isn't preceded by the phrase "as of".
19-2-11
something something 15-4-11
such and such as of 29-5-11
Here is my regular expression:
(?<!as of )(\d{1,2}-\d{1,2}-\d{2})
Expected results:
['19-2-11']
['15-4-11']
[]
Actual results:
['19-2-11']
['15-4-11']
['9-5-11']
Notice that's 9 not 29. If I change \d{1,2}
to something solid like \d{2}
on the first pattern:
bad regex for testing: (?<!as of )(\d{2}-\d{1,2}-\d{2})
Then I get my expected results. Of course this is no good because I'd like to match 2-digit days as well as single-digit days.
Apparently my negative lookbehind is quity greedy -- moreso than my date capture, so it's stealing a digit from it and failing. I've tried every means of correcting the greed I can think of, but I just don't know to fix this.
I'd like my date capture to match with the utmost greed, and then my negative lookbehind be applied. Is this possible? My problem seemed like a good use of negative lookbehinds and not overly complicated. I'm sure I could accomplish it another way if I must but I'd like to learn how to do this.
How do I make Python's negative lookbehind less greedy?
Show activity on this post. Javascript has support for only positive and negative lookahead with no support whatsoever for lookbehinds, but you can still mimic the latter in Javascript using callbacks.
It means the greedy quantifiers will match their preceding elements as much as possible to return to the biggest match possible. On the other hand, the non-greedy quantifiers will match as little as possible to return the smallest match possible. non-greedy quantifiers are the opposite of greedy ones.
Negative Lookbehind Syntax:Where match is the item to match and element is the character, characters or group in regex which must not precede the match, to declare it a successful match. So if you want to avoid matching a token if a certain token precedes it you may use negative lookbehind. For example / (? <!
Regex Lookbehind is used as an assertion in Python regular expressions(re) to determine success or failure whether the pattern is behind i.e to the right of the parser's current position. They don't match anything. Hence, Regex Lookbehind and lookahead are termed as a zero-width assertion.
This has nothing to do with greediness. Greediness doesn't change whether a regular expression matches or not - it changes only the order in which the search is performed. The problem here is that your regular expression needs to be more specific to avoid unwanted matches.
To fix it you could require a word boundary just before your match:
(?<!as of )\b(\d{1,2}-\d{1,2}-\d{2})
# ^^ add this
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With