I am aware that this is a corner case, but I have come across a code that uses regular expression with variable number of groups
According to docs this is legal:
The captured input associated with a group is always the subsequence that the group most recently matched. If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails. Matching the string "aba" against the expression (a(b)?)+, for example, leaves group two set to "b". All captured input is discarded at the beginning of each match.
However, when I try to use that with the unicode sign 'GRINNING FACE WITH SMILING EYES' (U+1F601) I get StringIndexOutOfBoundsException.
Is that expected according to the spec or a bug?
Here is the test code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestEmoji {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(A.)* EEE");
testGroups(pattern, "ACAB EEE");
testGroups(pattern, "ABACA\uD83D\uDE01");
}
public static void testGroups(Pattern pattern, String s) {
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println("matches");
System.out.println(matcher.groupCount());
for (int i = 1; i <= matcher.groupCount(); ++i) {
System.out.println(matcher.group(i));
}
}
}
}
and the exception:
matches
1
AB
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -2
at java.lang.String.charAt(String.java:658)
at java.util.regex.Pattern$Slice.match(Pattern.java:3867)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4382)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4354)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:4304)
at java.util.regex.Matcher.match(Matcher.java:1221)
at java.util.regex.Matcher.matches(Matcher.java:559)
at TestEmoji.testGroups(TestEmoji.java:19)
at TestEmoji.main(TestEmoji.java:12)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
After some digging in Java Bugs database, I found it:
http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8007395
JDK-8007395 : StringIndexOutofBoundsException in Match.find() when input String contains surrogate UTF-16 characters
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With