Regular Expression Lookbehind doesn't work with quantifiers ('+' or '*')

Q: Can I use Lookbehind regex?

The good news is that you can use lookbehind anywhere in the regex, not only at the start. If you want to find a word not ending with an “s”, you could use \b\w+(? <! s)\b.

Q: What is regular expression quantifier?

quantifier matches the preceding element one or more times, but as few times as possible. It is the lazy counterpart of the greedy quantifier + . For example, the regular expression \b\w+?\ b matches one or more characters separated by word boundaries.

Q: Does JavaScript regex support Lookbehind?

JavaScript doesn't support any lookbehind, but it can support lookaheads.

Tags:

regex

lookbehind

I am trying to use lookbehinds in a regular expression and it doesn't seem to work as I expected. So, this is not my real usage, but to simplify I will put an example. Imagine I want to match "example" on a string that says "this is an example". So, according to my understanding of lookbehinds this should work:

(?<=this\sis\san\s*?)example

What this should do is find "this is an", then space characters and finally match the word "example". Now, it doesn't work and I don't understand why, is it impossible to use '+' or '*' inside lookbehinds?

I also tried those two and they work correctly, but don't fulfill my needs:

(?<=this\sis\san\s)example this\sis\san\s*?example

I am using this site to test my regular expressions: http://gskinner.com/RegExr/

845

asked Jan 27 '12 07:01

Noel De Martin

2 Answers

Many regular expression libraries do only allow strict expressions to be used in look behind assertions like:

only match strings of the same fixed length: (?<=foo|bar|\s,\s) (three characters each)
only match strings of fixed lengths: (?<=foobar|\r\n) (each branch with fixed length)
only match strings with a upper bound length: (?<=\s{,4}) (up to four repetitions)

The reason for these limitations are mainly because those libraries can’t process regular expressions backwards at all or only a limited subset.

Another reason could be to avoid authors to build too complex regular expressions that are heavy to process as they have a so called pathological behavior (see also ReDoS).

See also section about limitations of look-behind assertions on Regular-Expressions.info.

answered Sep 21 '22 00:09

Gumbo

Hey if your not using python variable look behind assertion you can trick the regex engine by escaping the match and starting over by using \K.

This site explains it well .. http://www.phpfreaks.com/blog/pcre-regex-spotlight-k ..

But pretty much when you have an expression that you match and you want to get everything behind it using \K will force it to start over again...

Example:

string = '<a this is a tag> with some information <div this is another tag > LOOK FOR ME </div>'

matching /(\<a).+?(\<div).+?(\>)\K.+?(?=\<div)/ will cause the regex to restart after you match the ending div tag so the regex won't include that in the result. The (?=\div) will make the engine get everything in front of ending div tag

answered Sep 22 '22 00:09

Leon

Related questions
                            
                                Capturing text between square brackets in PHP
                            
                                How to implement a SQL like 'LIKE' operator in java?
                            
                                Regex to check whether string starts with, ignoring case differences
                            
                                Regex to pick characters outside of pair of quotes
                            
                                How to remove square brackets in string using regex?
                            
                                Replace all characters except letters, numbers, spaces and underscores [closed]
                            
                                Regular expression for 10 digit number without any special characters
                            
                                Regex (C#): Replace \n with \r\n
                            
                                Regular expression to count number of commas in a string
                            
                                Extract all email addresses from bulk text using jquery
                            
                                Javascript regex for matching/extracting file extension
                            
                                Apply a Regex on Stream?
                            
                                Is it possible to change emacs' regexp syntax?
                            
                                Why isn't there a regular expression standard?
                            
                                How does this regex find triangular numbers?
                            
                                JSLint "insecure ^" in regular expression
                            
                                Whats the difference between \z and \Z in a regular expression and when and how do I use it?
                            
                                Regular Expression Vs. String Parsing
                            
                                How to match a new line character in Python raw string
                            
                                What does regex' flag 'y' do?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With