I am attempting to use a regular expression with Scanner to match a string from a file. The regex works with all of the contents of the file except for this line:
DNA="ITTTAITATIATYAAAYIYI[....]ITYTYITTIYAIAIYIT"
in the actual file, the ellipsis represents several thousand more characters.
When the loop that reads the file arrives on the line containing the bases, a stack overflow error occurs.
Here is the loop:
while (scanFile.hasNextLine()) {
final String currentLine = scanFile.findInLine(".*");
System.out.println("trying to match '" + currentLine + "'");
Scanner internalScanner = new Scanner(currentLine);
String matchResult = internalScanner.findInLine(Constants.ANIMAL_INFO_REGEX);
assert matchResult != null : "there's no reason not to find a match";
matches.put(internalScanner.match().group(1), internalScanner.match().group(2));
scanFile.nextLine();
}
and the regex:
static final String ANIMAL_INFO_REGEX = "([a-zA-Z]+) *= *\"(([a-zA-Z_.]| |\\.)+)";
Here's the failure trace:
java.lang.StackOverflowError
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3360)
at java.util.regex.Pattern$Branch.match(Pattern.java:4131)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4185)
at java.util.regex.Pattern$Loop.match(Pattern.java:4312)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4244)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4095)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3362)
at java.util.regex.Pattern$Branch.match(Pattern.java:4131)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4185)
at java.util.regex.Pattern$Loop.match(Pattern.java:4312)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4244)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4095)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3362)
at java.util.regex.Pattern$Branch.match(Pattern.java:4131)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4185)
at java.util.regex.Pattern$Loop.match(Pattern.java:4312)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4244)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4095)
...etc (it's all regex).
Thanks so much!
Increase Thread Stack Size (-Xss) Increasing the stack size can be useful, for example, when the program involves calling a large number of methods or using lots of local variables. This will set the thread's stack size to 4 mb which should prevent the JVM from throwing a java. lang. StackOverflowError .
The main cause of the StackOverflowError is that we haven't provided the proper terminating condition to our recursive function or template, which means it will turn into an infinite loop.
Questions tagged [regex] Ask Question. Regular expressions provide a declarative language to match patterns within strings. They are commonly used for string validation, parsing, and transformation.
This looks like bug 5050507 . I agree with Asaph that removing the alternation should help; the bug specifically says "Avoid alternation whenever possible". I think you can go probably even simpler:
"^([a-zA-Z]+) *= *\"([^\"]+)"
Try this simplified version of your regex that removes some unnecessary |
operators (which might have been causing the regex engine to do a lot of branching) and includes beginning and end of line anchors.
static final String ANIMAL_INFO_REGEX = "^([a-zA-Z]+) *= *\"([a-zA-Z_. ]+)\"$";
read this to understand the problem: http://www.regular-expressions.info/catastrophic.html ... and then use one of the other suggestions
As the others have said, your regex is much less efficient than it should be. I'd take it a step further and use possessive quantifiers:
"^([a-zA-Z]++) *+= *+\"([^\"]++)\"$"
But the way you're using the Scanner doesn't make much sense, either. There's no need to use findInLine(".*")
to read the line; that's what nextLine()
does. And you don't need to create another Scanner to apply your regex; just use a Matcher.
static final Pattern ANIMAL_INFO_PATTERN =
Pattern.compile("^([a-zA-Z]++) *+= *+\"([^\"]++)\"$");
...
Matcher lineMatcher = ANIMAL_INFO_PATTERN.matcher("");
while (scanFile.hasNextLine()) {
String currentLine = scanFile.nextLine();
if (lineMatcher.reset(currentLine).matches()) {
matches.put(lineMatcher.group(1), lineMatcher.group(2));
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With