Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java.lang.StackOverflowError while using a RegEx to Parse big strings

This is my Regex

((?:(?:'[^']*')|[^;])*)[;] 

It tokenizes a string on semicolons. For example,

Hello world; I am having a problem; using regex; 

Result is three strings

Hello world I am having a problem using regex 

But when I use a large input string I get this error

Exception in thread "main" java.lang.StackOverflowError at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168) at java.util.regex.Pattern$Loop.match(Pattern.java:4295) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227) at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078) at java.util.regex.Pattern$CharProperty.match(Pattern.java:3345) at java.util.regex.Pattern$Branch.match(Pattern.java:4114) at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168) at java.util.regex.Pattern$Loop.match(Pattern.java:4295) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227) 

How is this caused and how can I solve it?

like image 414
Ali Avatar asked Sep 22 '11 05:09

Ali


People also ask

What causes StackOverflowError?

A StackOverflowError is a runtime error in Java. It is thrown when the amount of call stack memory allocated by the JVM is exceeded. A common case of a StackOverflowError being thrown, is when the call stack exceeds due to excessive deep or infinite recursion.

What is StackOverflowError in Java?

StackOverflowError is an error which Java doesn't allow to catch, for instance, stack running out of space, as it's one of the most common runtime errors one can encounter.


2 Answers

Unfortunately, Java's builtin regex support has problems with regexes containing repetitive alternative paths (that is, (A|B)*). This is compiled into a recursive call, which results in a StackOverflow error when used on a very large string.

A possible solution is to rewrite your regex to not use a repititive alternative, but if your goal is to tokenize a string on semicolons, you don't need a complex regex at all really, just use String.split() with a simple ";" as the argument.

like image 164
Jeen Broekstra Avatar answered Sep 18 '22 06:09

Jeen Broekstra


If you really need to use a regex that overflows your stack, you can increase the size of your stack by passing something like -Xss40m to the JVM.

like image 25
Andrew Avatar answered Sep 18 '22 06:09

Andrew