I have following regexp, which never evaluates and hangs infinitely:
import java.util.regex.Matcher
String AUTOGENERATED_HEADER = "#-=-=-= AUTOGENERATED HEADER =-=-=-"
String AUTOGENERATED_FOOTER = "#-=-=-= AUTOGENERATED FOOTER =-=-=-"
String messages = '''#-=-=-= AUTOGENERATED HEADER =-=-=-
a=b
c=d
x=y
#-=-=-= AUTOGENERATED FOOTER =-=-=-
'''
Matcher matcher = messages =~ /${AUTOGENERATED_HEADER}[\r\n]+((.*[\r\n]*)*)${AUTOGENERATED_FOOTER}/
matcher.find()
The problem is with part (.*[\r\n]*)
. When I change it to (.*[\r\n]+)
, it works.
You can experiment with regexp here. Can anybody explain how is it possible ?
What you have here is a case of a catastrophical backtracking. See your regex demo. The culprit is the (.*[\r\n]*)*
part that is enclosed with other subpatterns. The nested quantifiers cause too much backtracking that you can see on the regex debugger page at regex101.com.
A solution is to either use lazy dot matching: replace [\r\n]+((.*[\r\n]*)*)
with .*?
and add an (?s)
modifier at the start of the pattern, or use an unrolled version (which is much better for long inputs, but requires some hardcoding).
See (?s)#-=-=-= AUTOGENERATED HEADER =-=-=-.*?#-=-=-= AUTOGENERATED FOOTER =-=-=-
in action. Use
Matcher matcher = messages =~ /(?s)${AUTOGENERATED_HEADER}.*?${AUTOGENERATED_FOOTER}/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With