Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I make Python's negative lookbehind less greedy?

Tags:

python

regex

I've read all related posts and scoured the internet but this is really beating me.

I have some text containing a date.
I would like to capture the date, but not if it's preceded by a certain phrase.

A straightforward solution is to add a negative lookbehind to my RegEx.

Here are some examples (using findall).
I only want to capture the date if it isn't preceded by the phrase "as of".

19-2-11
something something 15-4-11
such and such as of 29-5-11

Here is my regular expression:

(?<!as of )(\d{1,2}-\d{1,2}-\d{2})

Expected results:

['19-2-11']
['15-4-11']
[]

Actual results:

['19-2-11']
['15-4-11']
['9-5-11']

Notice that's 9 not 29. If I change \d{1,2} to something solid like \d{2} on the first pattern:

bad regex for testing: (?<!as of )(\d{2}-\d{1,2}-\d{2})

Then I get my expected results. Of course this is no good because I'd like to match 2-digit days as well as single-digit days.

Apparently my negative lookbehind is quity greedy -- moreso than my date capture, so it's stealing a digit from it and failing. I've tried every means of correcting the greed I can think of, but I just don't know to fix this.

I'd like my date capture to match with the utmost greed, and then my negative lookbehind be applied. Is this possible? My problem seemed like a good use of negative lookbehinds and not overly complicated. I'm sure I could accomplish it another way if I must but I'd like to learn how to do this.

How do I make Python's negative lookbehind less greedy?

like image 562
Christopher Galpin Avatar asked May 02 '12 20:05

Christopher Galpin


People also ask

Can I use negative Lookbehind?

Show activity on this post. Javascript has support for only positive and negative lookahead with no support whatsoever for lookbehinds, but you can still mimic the latter in Javascript using callbacks.

What is non greedy expression in Python?

It means the greedy quantifiers will match their preceding elements as much as possible to return to the biggest match possible. On the other hand, the non-greedy quantifiers will match as little as possible to return the smallest match possible. non-greedy quantifiers are the opposite of greedy ones.

How do you use negative Lookbehind regex?

Negative Lookbehind Syntax:Where match is the item to match and element is the character, characters or group in regex which must not precede the match, to declare it a successful match. So if you want to avoid matching a token if a certain token precedes it you may use negative lookbehind. For example / (? <!

What is Lookbehind in regex?

Regex Lookbehind is used as an assertion in Python regular expressions(re) to determine success or failure whether the pattern is behind i.e to the right of the parser's current position. They don't match anything. Hence, Regex Lookbehind and lookahead are termed as a zero-width assertion.


1 Answers

This has nothing to do with greediness. Greediness doesn't change whether a regular expression matches or not - it changes only the order in which the search is performed. The problem here is that your regular expression needs to be more specific to avoid unwanted matches.

To fix it you could require a word boundary just before your match:

(?<!as of )\b(\d{1,2}-\d{1,2}-\d{2})
#          ^^ add this
like image 74
Mark Byers Avatar answered Oct 28 '22 22:10

Mark Byers