Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does "split" on an empty string return a non-empty array?

Tags:

java

scala

People also ask

What does split return for an empty string?

If the delimiter is an empty string, the split() method will return an array of elements, one element for each character of string. If you specify an empty string for string, the split() method will return an empty string and not an array of strings.

Can we return empty string in Java?

isEmpty() String method checks whether a String is empty or not. This method returns true if the given string is empty, else it returns false.

What is the point of an empty string?

The empty string is a legitimate string, upon which most string operations should work. Some languages treat some or all of the following in similar ways: empty strings, null references, the integer 0, the floating point number 0, the Boolean value false, the ASCII character NUL, or other such values.

What is the purpose of an empty string in Java?

What is an “empty” String in Java? “An empty String in Java means a String with length equal to zero.” If a String is empty that means the reference variable is referring to a memory location holding a String of length equal to zero.


If you split an orange zero times, you have exactly one piece - the orange.


The Java and Scala split methods operate in two steps like this:

  • First, split the string by delimiter. The natural consequence is that if the string does not contain the delimiter, a singleton array containing just the input string is returned,
  • Second, remove all the rightmost empty strings. This is the reason ",,,".split(",") returns empty array.

According to this, the result of "".split(",") should be an empty array because of the second step, right?

It should. Unfortunately, this is an artificially introduced corner case. And that is bad, but at least it is documented in java.util.regex.Pattern, if you remember to take a look at the documentation:

For n == 0, the result is as for n < 0, except trailing empty strings will not be returned. (Note that the case where the input is itself an empty string is special, as described above, and the limit parameter does not apply there.)

Solution 1: Always pass -1 as the second parameter

So, I advise you to always pass n == -1 as the second parameter (this will skip step two above), unless you specifically know what you want to achieve / you are sure that the empty string is not something that your program would get as an input.

Solution 2: Use Guava Splitter class

If you are already using Guava in your project, you can try the Splitter (documentation) class. It has a very rich API, and makes your code very easy to understand.

Splitter.on(".").split(".a.b.c.") // "", "a", "b", "c", ""
Splitter.on(",").omitEmptyStrings().split("a,,b,,c") // "a", "b", "c"
Splitter.on(CharMatcher.anyOf(",.")).split("a,b.c") // "a", "b", "c"
Splitter.onPattern("=>?").split("a=b=>c") // "a", "b", "c"
Splitter.on(",").limit(2).split("a,b,c") // "a", "b,c"

Splitting an empty string returns the empty string as the first element. If no delimiter is found in the target string, you will get an array of size 1 that is holding the original string, even if it is empty.


For the same reason that

",test" split ','

and

",test," split ','

will return an array of size 2. Everything before the first match is returned as the first element.


"a".split(",") -> "a" therefore "".split(",") -> ""


In all programming languages I know a blank string is still a valid String. So doing a split using any delimiter will always return a single element array where that element is the blank String. If it was a null (not blank) String then that would be a different issue.