What is the best method of splitting a String into a Stream?
I saw these variations:
Arrays.stream("b,l,a".split(","))
Stream.of("b,l,a".split(","))
Pattern.compile(",").splitAsStream("b,l,a")
My priorities are:
A complete, compilable example:
import java.util.Arrays;
import java.util.regex.Pattern;
import java.util.stream.Stream;
public class HelloWorld {
public static void main(String[] args) {
stream1().forEach(System.out::println);
stream2().forEach(System.out::println);
stream3().forEach(System.out::println);
}
private static Stream<String> stream1() {
return Arrays.stream("b,l,a".split(","));
}
private static Stream<String> stream2() {
return Stream.of("b,l,a".split(","));
}
private static Stream<String> stream3() {
return Pattern.compile(",").splitAsStream("b,l,a");
}
}
split() The method split() splits a String into multiple Strings given the delimiter that separates them. The returned object is an array which contains the split Strings. We can also pass a limit to the number of elements in the returned array.
The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.
Arrays.stream
/String.split
Since String.split
returns an array String[]
, I always recommend Arrays.stream
as the canonical idiom for streaming over an array.
String input = "dog,cat,bird";
Stream<String> stream = Arrays.stream(input.split( "," ));
stream.forEach(System.out::println);
Stream.of
/String.split
Stream.of
is a varargs method which just happens to accept an array, due to the fact that varargs methods are implemented via arrays and there were compatibility concerns when varargs were introduced to Java and existing methods retrofitted to accept variable arguments.
Stream<String> stream = Stream.of(input.split(",")); // works, but is non-idiomatic
Stream<String> stream = Stream.of("dog", "cat", "bird"); // intended use case
Pattern.splitAsStream
Pattern.compile(",").splitAsStream(string)
has the advantage of streaming directly rather than creating an intermediate array. So for a large number of sub-strings, this can have a performance benefit. On the other hand, if the delimiter is trivial, i.e. a single literal character, the String.split
implementation will go through a fast path instead of using the regex engine. So in this case, the answer is not trivial.
Stream<String> stream = Pattern.compile(",").splitAsStream(input);
If the streaming happens inside another stream, e.g. .flatMap(Pattern.compile(pattern) ::splitAsStream)
there is the advantage that the pattern has to be analyzed only once, rather than for every string of the outer stream.
Stream<String> stream = Stream.of("a,b", "c,d,e", "f", "g,h,i,j")
.flatMap(Pattern.compile(",")::splitAsStream);
This is a property of method references of the form expression::name
, which will evaluate the expression and capture the result when creating the instance of the functional interface, as explained in What is the equivalent lambda expression for System.out::println and java.lang.NullPointerException is thrown using a method-reference but not a lambda expression
Regarding (1) and (2) there shouldn't be much difference, as your code is almost the same.
Regarding (3), that would be much more effective it terms of memory (not necessarily CPU), but in my opinion, a bit harder to read.
Robustness
I can see no difference in the robustness of the three approaches.
Readability
I am not aware of any credible scientific studies on code readability involving experienced Java programmers, so readability is a matter of opinion. Even then, you never know if someone giving their opinion is making an objective distinction between actual readability, what they have been taught about readability, and their own personal taste.
So I will leave it to you to make your own judgements on readability ... noting that you do consider this to be a high priority.
FWIW, the only people whose opinions on this matter are you and your team.
Performance
I think that the answer to that is to carefully benchmark the three alternatives. Holger provides an analysis based on his study of some versions of Java. But:
Stream
object, what garbage collector you have selected (since the different versions apparently generate different amounts of garbage), and other issues.So if you (or anyone else) are really concerned with the performance, you should write a micro-benchmark and run it on your production platform(s). Then do some application specific benchmarking. And you should consider looking at solutions that don't involve streams.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With