Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java regex look-behind group does not have obvious maximum length error

Tags:

java

regex

I know that java regex does not support varying length look-behinds, and that the following should cause an error

(?<=(not exceeding|no((\\w|\\s)*)more than))xxxx

but when the * is replaced with a fixed length specifier as such

(?<=(not exceeding|no((\\w|\\s){0,30})more than))xxxx

it still fails. Why is this?

like image 942
user2559503 Avatar asked Jul 21 '14 20:07

user2559503


1 Answers

Java Lookbehind is Notoriously Buggy

So you thought Java did not support infinite lookbehind?

But the following pattern will compile!

(?<=\d+)\w+

...though in a Match All it will yield unexpected results (see demo).

On the other hand, you can with success use this other infinite lookbehind (which I found with great surprise on this question)

(?<=\\G\\d+,\\d+,\\d+),

to split this string: 0,123,45,6789,4,5,3,4,6000

It will correctly output (see the online demo):

0,123,45
6789,4,5
3,4,6000

This time the results are what you expect.

But if you tweak the regex the slightest bit to obtain pairs instead of triplets, with (?<=\\G\\d+,\\d+),, this time it will not split (see the demo).


The bottom line

Java lookbehind is notoriously buggy. Knowing this, I recommend you don't waste time trying to understand why it does something that is undocumented.

The decisive words that drove me to this conclusion some time ago are those from Jan Goyvaerts, who is a co-author of The Regex Cookbook and an arch-regex-guru who has created a terrific regex engine and needs to stay on top of most regex flavors under the sun for his debugging tool RegexBuddy:

Java has a number of bugs in its lookbehind implementation. Some (but not all) of those were fixed in Java 6.

like image 56
zx81 Avatar answered Oct 21 '22 09:10

zx81