For example I have a comma separated string:
String multiWordString= "... , ... , ... ";
And I want to check whether another string str is present in the csv string. Then I can do following 2 things:
1.
boolean contains = Arrays.asList(multiWordString.split(",")).contains(str);
2.
boolean contains = Arrays.asList(multiWordString.split(",")).stream().filter(e -> e.equals(str)).findFirst();
EDIT: The sample string happens to use comma as a delimiter. I should have used the better name for sample string to avoid confusion. I updated the name. In this question I am trying to find the performance difference between using Java 8 streams and loops/collection methods.
Without tests it's impossible to tell, details internally can change of how one solutions of another acts, so the best way is to measure. It is know though that streams are a bit slower - they do have an infrastructure behind them...
Here is a naive simple test (with little data):
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@State(Scope.Benchmark)
public class CSVParsing {
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder().include(CSVParsing.class.getSimpleName())
.jvmArgs("-ea")
.shouldFailOnError(true)
.build();
new Runner(opt).run();
}
@Param(value = { "a,e, b,c,d",
"a,b,c,d, a,b,c,da,b,c,da,b,c,da,b,c,da,b,c,da,b,c,da,b,c,da,b,c,d, e",
"r, m, n, t,r, m, n, tr, m, n, tr, m, n, tr, m, n, tr, m, n, tr, m, n, tr, m, n, t, e" })
String csv;
@Fork(1)
@Benchmark
public boolean containsSimple() {
return Arrays.asList(csv.split(",")).contains("e");
}
@Fork(1)
@Benchmark
public boolean containsStream() {
return Arrays.asList(csv.split(",")).stream().filter(e -> e.equals("e")).findFirst().isPresent();
}
@Fork(1)
@Benchmark
public boolean containsStreamParallel() {
return Arrays.asList(csv.split(",")).stream().filter(e -> e.equals("e")).findFirst().isPresent();
}
}
Even if you don't understand the code the results are simple numbers that you can compare:
CSVParsing.containsSimple (first Parameter) 181.201 ± 5.390
CSVParsing.containsStream 255.851 ± 5.598
CSVParsing.containsStreamParallel 295.296 ± 57.800
I am not going to show the rest of the results (for other parameters) since they are in the same range.
Bottom line is they do differ, by up to 100 ns; let me re-iterate that: nano-seconds.
There is a difference indeed; but if you really honestly care about this diff, then csv parsing is probably the wrong choice in the first place.
Watch out, CSVs are generally more complex then just comma's seperating strings, theres escaping comma's to worry about as well. I hope this is either an example or not a CSV format being imported.
You shouldn't convert from an array to a list first, go straight from the array to the stream using Arrays.stream or Stream.of()
But the streams are lazy, and they only do as much work as they need to do.
.contains(str) will abort as soon as it finds a match.
It's hard to tell performance without measuring it, so for now make the program correct and easy to maintain.
If performance is a concern, after you have some amount done, profile and see what bits could be better, try alternatives, then pick the winner.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With