I am getting StackOverflowError
when I use
following Reg Ex :
"([A-Z][A-Z]\\d\\d[A-Z]\\[(\\*|(((\\d|\\d\\d)-(\\d|\\d\\d))|(\\d|\\d\\d)))\\](,|$))+";
to match something like this String
:
RA01D[1-1],RA01D[17-17],RA01D[2-2],RA01D[18-18]
Increase Thread Stack Size (-Xss) Increasing the stack size can be useful, for example, when the program involves calling a large number of methods or using lots of local variables. This will set the thread's stack size to 4 mb which should prevent the JVM from throwing a java. lang. StackOverflowError .
StackOverflowError is an error which Java doesn't allow to catch, for instance, stack running out of space, as it's one of the most common runtime errors one can encounter.
Solution. The simplest solution is to carefully inspect the stack trace and detect the repeating pattern of line numbers. These line numbers indicate the code that is being recursively called. Once you detect these lines, look for the terminating condition (base condition) for the recursive calls.
What stribizhev's answer has pointed out and fixed is an inefficiency in the regex. There is no catastrophic backtracking here. The change only slightly delays the StackOverflowError
without resolving it (see Appendix).
In the original regex, if the first branch (\d|\d\d)-(\d|\d\d)
fails, the second branch will do extra work matching (\d|\d\d)
again, which is the prefix of the first branch.
(
(
(\d|\d\d)-(\d|\d\d)
)
|
(\d|\d\d)
)
When re-written (as shown in his answer), the prefix (\d|\d\d)
will only be matched once, and the engine only needs to check the 2 different sequels (matching -(\d|\d\d)
or just an empty string).
(\d|\d\d)(?:-(\d|\d\d))?
This is how his answer improves on the efficiency of the regex. The same method is applied to \d|\d\d
.
Back to the problem of StackOverflowError
. If you run the regex on a string with 1000 elements, any of the regex above will cause StackOverflowError
. It is due to the implementation of Pattern class in Sun/Oracle/OpenJDK, which uses recursion for greedy and lazy quantifier.
Since the regex is non-ambiguous, you can fix the problem by making the quantifier on the outer most group possessive. The regex is copied from stribizhev's answer with some modifications:
"(?:[A-Z][A-Z]\\d\\d[A-Z]\\[(?:\\*|\\d{1,2}(?:-\\d{1,2})?)\\](?:,|$))++"
^^
Since the implementation uses a loop to implement possessive quantifier (as there is no need to backtrack), StackOverflowError
cannot occur, regardless of how long the input string is. The stack usage is only as much as a single repetition, as opposed to the case in the question, where it grows linearly to the number of elements in the string.
Below is a test program showing the number of elements that the regex can handle. On my system (Oracle JRE, version 1.8.0_25), the regex in the question only manages to reach 104 * 4 = 416 elements before crashing, stribizhev's answer manages to reach 137 * 4 = 548, stribizhev's answer modified to remove unnecessary group manages to reach 150 * 4 = 600, and the version with possessive quantifier passes all tests (500 * 4 = 2000 elements).
public class SO29758814 {
public static void main(String[] args) {
String s = "";
for (int i = 1; i <= 500; i++) {
s += "RA01D[1-1],RA01D[17-17],RA01D[2-2],RA01D[18-18],";
System.out.print(i);
try {
// Question
System.out.print(" 1 " + s.matches("([A-Z][A-Z]\\d\\d[A-Z]\\[(\\*|(((\\d|\\d\\d)-(\\d|\\d\\d))|(\\d|\\d\\d)))\\](,|$))+"));
} catch (Throwable e) { }
try {
// stribizhev's answer
System.out.print(" 2 " + s.matches("([A-Z]{2}\\d{2}[A-Z]\\[(\\*|((\\d{1,2})(?:-(\\d{1,2}))?))\\](?:,|$))+"));
} catch (Throwable e) { }
try {
// stribizhev's answer, remove unnecessary groups
System.out.print(" 3 " + s.matches("(?:[A-Z][A-Z]\\d\\d[A-Z]\\[(?:\\*|\\d{1,2}(?:-\\d{1,2})?)\\](?:,|$))+"));
} catch (Throwable e) { }
try {
// stribizhev's answer, remove unnecessary groups, use possessive quantifier
System.out.print(" 4 " + s.matches("(?:[A-Z][A-Z]\\d\\d[A-Z]\\[(?:\\*|\\d{1,2}(?:-\\d{1,2})?)\\](?:,|$))++"));
} catch (Throwable e) { }
System.out.println();
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With