I am new to Java Regular expression. We are using a pattern for matching a string. We are using this for validating a text field and it meets our requirements. But there is a performance issue in the matching.
Pattern : ([a-zA-Z0-9]+[ ]?(([_\-][a-zA-Z0-9 ])*)?[_\-]?)+
Our problem is, for certain input strings the CPU time goes high and causes hanging the threads. Also we get exceptions. Can anyone please help me to optimize the Pattern or suggest a new pattern to solve my issue.
Exception details
============================================
Hung thread details, all the same:
[9/28/11 11:40:07:320 CDT] 00000003 ThreadMonitor W WSVR0605W: Thread "WebContainer : 26" (0000004f) has been active for 709755 mi
lliseconds and may be hung. There is/are 1 thread(s) in total in the server that may be hung.
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3938)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3801)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3801)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4323)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4263)
at java.util.regex.Matcher.match(Matcher.java:1139)
at java.util.regex.Matcher.matches(Matcher.java:514)
Your problem is catastrophic backtracking which you're getting because you have nested quantifiers. This starts becoming a problem when the text doesn't match the requirements because of the exponentially increasing number of permutations the regex engine has to go through to declare failure.
([a-zA-Z0-9]+[ ]?(([_\-][a-zA-Z0-9 ])*)?[_\-]?)+
^ ^
| repetition
repetition
Reconstruct your regex like this:
(?i)^(?!.*(?:--|__))[A-Z0-9][\w-]*(?: [\w-]+)*$
Java, with explanation:
boolean foundMatch = subjectString.matches(
"(?ix) # Case-insensitive, multiline regex:\n" +
"^ # Start of string\n" +
"(?! # Assert that it's impossible to match\n" +
" .* # any number of characters\n" +
" (?:--|__) # followed by -- or __\n" +
") # End of lookahead assertion\n" +
"[A-Z0-9] # Match A-Z, a-z or 0-9\n" +
"[\\w-]* # Match 0 or more alnums/dash\n" +
"(?: # Match the following:\n" +
" [\\ ] # a single space\n" +
" [\\w-]+ # followed by one or more alnums or -\n" +
")* # any number of times\n" +
"$ # End of string");
Note that the string must not end in a space as per your requirements. In case you're wondering, \w
is a shorthand for [A-Za-z0-9_]
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With