Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java.lang.StackOverflowError at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3715)

Tags:

java

regex

linux

I am getting StackOverflowError when I use following Reg Ex :

"([A-Z][A-Z]\\d\\d[A-Z]\\[(\\*|(((\\d|\\d\\d)-(\\d|\\d\\d))|(\\d|\\d\\d)))\\](,|$))+";

to match something like this String:

RA01D[1-1],RA01D[17-17],RA01D[2-2],RA01D[18-18]
like image 740
Julie D'Mello Avatar asked Apr 20 '15 21:04

Julie D'Mello


People also ask

How do I fix Java Lang StackOverflowError?

Increase Thread Stack Size (-Xss) Increasing the stack size can be useful, for example, when the program involves calling a large number of methods or using lots of local variables. This will set the thread's stack size to 4 mb which should prevent the JVM from throwing a java. lang. StackOverflowError .

What is StackOverflowError in Java?

StackOverflowError is an error which Java doesn't allow to catch, for instance, stack running out of space, as it's one of the most common runtime errors one can encounter.

How does Java handle StackOverflowError?

Solution. The simplest solution is to carefully inspect the stack trace and detect the repeating pattern of line numbers. These line numbers indicate the code that is being recursively called. Once you detect these lines, look for the terminating condition (base condition) for the recursive calls.


1 Answers

What stribizhev's answer has pointed out and fixed is an inefficiency in the regex. There is no catastrophic backtracking here. The change only slightly delays the StackOverflowError without resolving it (see Appendix).

In the original regex, if the first branch (\d|\d\d)-(\d|\d\d) fails, the second branch will do extra work matching (\d|\d\d) again, which is the prefix of the first branch.

(
  (
    (\d|\d\d)-(\d|\d\d)
  )
  |
  (\d|\d\d)
)

When re-written (as shown in his answer), the prefix (\d|\d\d) will only be matched once, and the engine only needs to check the 2 different sequels (matching -(\d|\d\d) or just an empty string).

(\d|\d\d)(?:-(\d|\d\d))?

This is how his answer improves on the efficiency of the regex. The same method is applied to \d|\d\d.


Back to the problem of StackOverflowError. If you run the regex on a string with 1000 elements, any of the regex above will cause StackOverflowError. It is due to the implementation of Pattern class in Sun/Oracle/OpenJDK, which uses recursion for greedy and lazy quantifier.

Since the regex is non-ambiguous, you can fix the problem by making the quantifier on the outer most group possessive. The regex is copied from stribizhev's answer with some modifications:

"(?:[A-Z][A-Z]\\d\\d[A-Z]\\[(?:\\*|\\d{1,2}(?:-\\d{1,2})?)\\](?:,|$))++"
                                                                     ^^

Since the implementation uses a loop to implement possessive quantifier (as there is no need to backtrack), StackOverflowError cannot occur, regardless of how long the input string is. The stack usage is only as much as a single repetition, as opposed to the case in the question, where it grows linearly to the number of elements in the string.

Appendix

Test program

Below is a test program showing the number of elements that the regex can handle. On my system (Oracle JRE, version 1.8.0_25), the regex in the question only manages to reach 104 * 4 = 416 elements before crashing, stribizhev's answer manages to reach 137 * 4 = 548, stribizhev's answer modified to remove unnecessary group manages to reach 150 * 4 = 600, and the version with possessive quantifier passes all tests (500 * 4 = 2000 elements).

public class SO29758814 {
    public static void main(String[] args) {
        String s = "";
        for (int i = 1; i <= 500; i++) {

            s += "RA01D[1-1],RA01D[17-17],RA01D[2-2],RA01D[18-18],";

            System.out.print(i);

            try {
                // Question
                System.out.print(" 1 " + s.matches("([A-Z][A-Z]\\d\\d[A-Z]\\[(\\*|(((\\d|\\d\\d)-(\\d|\\d\\d))|(\\d|\\d\\d)))\\](,|$))+"));
            } catch (Throwable e) { }

            try {
                // stribizhev's answer
                System.out.print(" 2 " + s.matches("([A-Z]{2}\\d{2}[A-Z]\\[(\\*|((\\d{1,2})(?:-(\\d{1,2}))?))\\](?:,|$))+"));
            } catch (Throwable e) { }

            try {
                // stribizhev's answer, remove unnecessary groups
                System.out.print(" 3 " + s.matches("(?:[A-Z][A-Z]\\d\\d[A-Z]\\[(?:\\*|\\d{1,2}(?:-\\d{1,2})?)\\](?:,|$))+"));
            } catch (Throwable e) { }

            try {
                // stribizhev's answer, remove unnecessary groups, use possessive quantifier
                System.out.print(" 4 " + s.matches("(?:[A-Z][A-Z]\\d\\d[A-Z]\\[(?:\\*|\\d{1,2}(?:-\\d{1,2})?)\\](?:,|$))++"));
            } catch (Throwable e) { }

            System.out.println();

        }
    }
}
like image 172
nhahtdh Avatar answered Sep 25 '22 21:09

nhahtdh