How do I make Python's negative lookbehind less greedy?

Q: What is Lookbehind in regex?

Regex Lookbehind is used as an assertion in Python regular expressions(re) to determine success or failure whether the pattern is behind i.e to the right of the parser's current position. They don't match anything. Hence, Regex Lookbehind and lookahead are termed as a zero-width assertion.

Tags:

python

regex

I've read all related posts and scoured the internet but this is really beating me.

I have some text containing a date.
I would like to capture the date, but not if it's preceded by a certain phrase.

A straightforward solution is to add a negative lookbehind to my RegEx.

Here are some examples (using findall).
I only want to capture the date if it isn't preceded by the phrase "as of".

19-2-11
something something 15-4-11
such and such as of 29-5-11

Here is my regular expression:

(?<!as of )(\d{1,2}-\d{1,2}-\d{2})

Expected results:

['19-2-11']
['15-4-11']
[]

Actual results:

['19-2-11']
['15-4-11']
['9-5-11']

Notice that's 9 not 29. If I change \d{1,2} to something solid like \d{2} on the first pattern:

bad regex for testing: (?<!as of )(\d{2}-\d{1,2}-\d{2})

Then I get my expected results. Of course this is no good because I'd like to match 2-digit days as well as single-digit days.

Apparently my negative lookbehind is quity greedy -- moreso than my date capture, so it's stealing a digit from it and failing. I've tried every means of correcting the greed I can think of, but I just don't know to fix this.

I'd like my date capture to match with the utmost greed, and then my negative lookbehind be applied. Is this possible? My problem seemed like a good use of negative lookbehinds and not overly complicated. I'm sure I could accomplish it another way if I must but I'd like to learn how to do this.

How do I make Python's negative lookbehind less greedy?

562

asked May 02 '12 20:05

Christopher Galpin

1 Answers

This has nothing to do with greediness. Greediness doesn't change whether a regular expression matches or not - it changes only the order in which the search is performed. The problem here is that your regular expression needs to be more specific to avoid unwanted matches.

To fix it you could require a word boundary just before your match:

(?<!as of )\b(\d{1,2}-\d{1,2}-\d{2})
#          ^^ add this

answered Oct 28 '22 22:10

Mark Byers

Related questions
                            
                                Getting ActivePython to work with WSH
                            
                                How do Python modules work
                            
                                Getting `django-registration` to send you to the page you were originally trying to visit
                            
                                What's the best way to deploy a Flask app using Jython on Tomcat?
                            
                                Python Twisted WebSocket client
                            
                                Numpy and Scipy with Amazon Elastic MapReduce
                            
                                finding frequent string patterns using python
                            
                                expectedFailure is being counted as an error instead of as passed
                            
                                python import package - subpackage should not show up in symbol table
                            
                                Django - get distinct dates from timestamp
                            
                                Profiling memory in python 3 [duplicate]
                            
                                passing c++ double pointer to python
                            
                                How do I install in-house requirements for Python Heroku projects?
                            
                                mechanize cannot read form with SubmitControl that is disabled and has no value
                            
                                Python Logging module: custom loggers
                            
                                Check if there's something "waiting for" the return value of a function
                            
                                Capturing print output from shared library called from python with ctypes module
                            
                                Shell piping with subprocess in Python
                            
                                dendrogram in python
                            
                                GroupBy functions in Python Pandas like SUM(col_1*col_2), weighted average etc

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With