Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How optimized are Java 8 stream filters over collection methods?

For example I have a comma separated string:

String multiWordString= "... , ... , ... ";

And I want to check whether another string str is present in the csv string. Then I can do following 2 things:

1.

boolean contains = Arrays.asList(multiWordString.split(",")).contains(str);

2.

boolean contains = Arrays.asList(multiWordString.split(",")).stream().filter(e -> e.equals(str)).findFirst();

EDIT: The sample string happens to use comma as a delimiter. I should have used the better name for sample string to avoid confusion. I updated the name. In this question I am trying to find the performance difference between using Java 8 streams and loops/collection methods.

like image 666
tryingToLearn Avatar asked Dec 24 '22 17:12

tryingToLearn


2 Answers

Without tests it's impossible to tell, details internally can change of how one solutions of another acts, so the best way is to measure. It is know though that streams are a bit slower - they do have an infrastructure behind them...

Here is a naive simple test (with little data):

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@State(Scope.Benchmark)
public class CSVParsing {
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder().include(CSVParsing.class.getSimpleName())
                .jvmArgs("-ea")
                .shouldFailOnError(true)
                .build();
        new Runner(opt).run();
    }

    @Param(value = { "a,e, b,c,d",
            "a,b,c,d, a,b,c,da,b,c,da,b,c,da,b,c,da,b,c,da,b,c,da,b,c,da,b,c,d, e",
            "r, m, n, t,r, m, n, tr, m, n, tr, m, n, tr, m, n, tr, m, n, tr, m, n, tr, m, n, t, e" })
    String csv;

    @Fork(1)
    @Benchmark
    public boolean containsSimple() {
        return Arrays.asList(csv.split(",")).contains("e");
    }

    @Fork(1)
    @Benchmark
    public boolean containsStream() {
        return Arrays.asList(csv.split(",")).stream().filter(e -> e.equals("e")).findFirst().isPresent();
    }

    @Fork(1)
    @Benchmark
    public boolean containsStreamParallel() {
        return Arrays.asList(csv.split(",")).stream().filter(e -> e.equals("e")).findFirst().isPresent();
    }
}

Even if you don't understand the code the results are simple numbers that you can compare:

 CSVParsing.containsSimple   (first Parameter)    181.201 ±   5.390
 CSVParsing.containsStream                        255.851 ±   5.598
 CSVParsing.containsStreamParallel                295.296 ±  57.800

I am not going to show the rest of the results (for other parameters) since they are in the same range.

Bottom line is they do differ, by up to 100 ns; let me re-iterate that: nano-seconds.

There is a difference indeed; but if you really honestly care about this diff, then csv parsing is probably the wrong choice in the first place.

like image 116
Eugene Avatar answered Dec 28 '22 07:12

Eugene


Watch out, CSVs are generally more complex then just comma's seperating strings, theres escaping comma's to worry about as well. I hope this is either an example or not a CSV format being imported.

You shouldn't convert from an array to a list first, go straight from the array to the stream using Arrays.stream or Stream.of()

But the streams are lazy, and they only do as much work as they need to do.

.contains(str) will abort as soon as it finds a match.

It's hard to tell performance without measuring it, so for now make the program correct and easy to maintain.

If performance is a concern, after you have some amount done, profile and see what bits could be better, try alternatives, then pick the winner.

like image 33
Ryan Leach Avatar answered Dec 28 '22 07:12

Ryan Leach