I recently harnessed the power of a look-ahead regular expression to split a String:
"abc8".split("(?=\\d)|\\W")
If printed to the console this expression returns:
[abc, 8]
Very pleased with this result, I wanted to transfer this to Guava for further development, which looked like this:
Splitter.onPattern("(?=\\d)|\\W").split("abc8")
To my surprise the output changed to:
[abc]
Why?
String myString = "Jane-Doe"; String[] splitString = myString. split("-"); We can simply use a character/substring instead of an actual regular expression. Of course, there are certain special characters in regex which we need to keep in mind, and escape them in case we want their literal value.
Java split() function is used to splitting the string into the string array based on the regular expression or the given delimiter. The resultant object is an array contains the split strings. In the resultant returned array, we can pass the limit to the number of elements.
You found a bug!
System.out.println(s.split("abc82")); // [abc, 8] System.out.println(s.split("abc8")); // [abc]
This is the method that Splitter
uses to actually split String
s (Splitter.SplittingIterator::computeNext
):
@Override protected String computeNext() { /* * The returned string will be from the end of the last match to the * beginning of the next one. nextStart is the start position of the * returned substring, while offset is the place to start looking for a * separator. */ int nextStart = offset; while (offset != -1) { int start = nextStart; int end; int separatorPosition = separatorStart(offset); if (separatorPosition == -1) { end = toSplit.length(); offset = -1; } else { end = separatorPosition; offset = separatorEnd(separatorPosition); } if (offset == nextStart) { /* * This occurs when some pattern has an empty match, even if it * doesn't match the empty string -- for example, if it requires * lookahead or the like. The offset must be increased to look for * separators beyond this point, without changing the start position * of the next returned substring -- so nextStart stays the same. */ offset++; if (offset >= toSplit.length()) { offset = -1; } continue; } while (start < end && trimmer.matches(toSplit.charAt(start))) { start++; } while (end > start && trimmer.matches(toSplit.charAt(end - 1))) { end--; } if (omitEmptyStrings && start == end) { // Don't include the (unused) separator in next split string. nextStart = offset; continue; } if (limit == 1) { // The limit has been reached, return the rest of the string as the // final item. This is tested after empty string removal so that // empty strings do not count towards the limit. end = toSplit.length(); offset = -1; // Since we may have changed the end, we need to trim it again. while (end > start && trimmer.matches(toSplit.charAt(end - 1))) { end--; } } else { limit--; } return toSplit.subSequence(start, end).toString(); } return endOfData(); }
The area of interest is:
if (offset == nextStart) { /* * This occurs when some pattern has an empty match, even if it * doesn't match the empty string -- for example, if it requires * lookahead or the like. The offset must be increased to look for * separators beyond this point, without changing the start position * of the next returned substring -- so nextStart stays the same. */ offset++; if (offset >= toSplit.length()) { offset = -1; } continue; }
This logic works great, unless the empty match happens at the end of a String
. If the empty match does occur at the end of a String
, it will end up skipping that character. What this part should look like is (notice >=
-> >
):
if (offset == nextStart) { /* * This occurs when some pattern has an empty match, even if it * doesn't match the empty string -- for example, if it requires * lookahead or the like. The offset must be increased to look for * separators beyond this point, without changing the start position * of the next returned substring -- so nextStart stays the same. */ offset++; if (offset > toSplit.length()) { offset = -1; } continue; }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With