Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my java lambda with a dummy assignment much faster than without it?

I know that making judgements of Java microbenchmarks is extremely fraught, but I'm seeing something that seems odd, and I'd like to get some explanation for it.

Note that I'm not using the JMH framework for this. I'm aware of it, but I didn't want to go to that length for this.

I'll provide the entire code sample, but in short, when I test the performance of these two methods

private FooPrime[] testStreamToArray(ArrayList<Foo> fooList) {
    return (FooPrime[]) fooList.stream().
                map(it -> {
                    return new FooPrime().gamma(it.getAlpha() + it.getBeta());
                }).
                toArray(FooPrime[]::new);
}

private FooPrime[] testStreamToArray2(ArrayList<Foo> fooList) {
    return (FooPrime[]) fooList.stream().
                map(it -> {
                    int stuff = it.getAlpha().length();
                    return new FooPrime().gamma(it.getAlpha() + it.getBeta());
                }).
                toArray(FooPrime[]::new);
}

I find very surprising results. In the larger code sample, I'm measuring four different ways of doing this, and the first three are very close in performance. They all run about 50k ns per iteration. However, the second code sample consistently runs just under half of that total. That's right. It's not slower, it's quite a bit faster.

The last run shows numbers like this:

manualcopy:54575 ns
toarray:53617 ns
streamtoarray:52990 ns
streamtoarray2:24217 ns

Each run has numbers similar to these.

I'll now provide the entire class and base class. Note that I do have a "warm-up" pass, where I execute the methods under test a few thousand times before starting the timings. Also note that although this runs "testStreamToArray2" last, I also tried moving that block to the first test, and the numbers come out about the same. The commented out lines are there to convince me that the methods are actually doing something (the timings are still about the same with those lines not commented out).

package timings;

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class ListToArrayOfPrimesTiming {

    public static void main(String[] args) {
        ListToArrayOfPrimesTiming tests = new ListToArrayOfPrimesTiming(args);
        tests.go();
    }

    public ListToArrayOfPrimesTiming(String[] args) { }

    private void go() {

        final ArrayList<Foo> fooList    = new ArrayList<>();

        for (int ctr = 0; ctr < 1000; ++ ctr) {
            fooList.add(new Foo().alpha("a" + ctr).beta("b" + ctr));
        }

        for (int ctr = 0; ctr < 20000; ++ ctr) {
            testManualCopy(fooList);
            testToArray(fooList);
            testStreamToArray(fooList);
            testStreamToArray2(fooList);
        }

        int iters   = 100000;

//      Set<Integer> lengths    = new HashSet<>();
//      Set<FooPrime>   distinctFooPrimes   = new HashSet<>();
//      lengths.clear();
//      distinctFooPrimes.clear();

        new TimingContainer(iters, "manualcopy", new TimingTest() {
            @Override
            public void run() {
                FooPrime[] fooPrimeArray = testManualCopy(fooList);
//              lengths.add(fooPrimeArray.length);
//              distinctFooPrimes.add(fooPrimeArray[0]);
            }
        }).run();

//      System.out.println("lengths[" + lengths + "]");
//      lengths.clear();
//      System.out.println("distinctFooPrimes[" + distinctFooPrimes + "]");
//      distinctFooPrimes.clear();

        new TimingContainer(iters, "toarray", new TimingTest() {
            @Override
            public void run() {
                FooPrime[] fooPrimeArray = testManualCopy(fooList);
//              lengths.add(fooPrimeArray.length);
//              distinctFooPrimes.add(fooPrimeArray[0]);
            }
        }).run();

//      System.out.println("lengths[" + lengths + "]");
//      lengths.clear();
//      System.out.println("distinctFooPrimes[" + distinctFooPrimes + "]");
//      distinctFooPrimes.clear();

        new TimingContainer(iters, "streamtoarray", new TimingTest() {
            @Override
            public void run() {
                FooPrime[] fooPrimeArray = testStreamToArray(fooList);
//              lengths.add(fooPrimeArray.length);
//              distinctFooPrimes.add(fooPrimeArray[0]);
            }
        }).run();

//      System.out.println("lengths[" + lengths + "]");
//      lengths.clear();
//      System.out.println("distinctFooPrimes[" + distinctFooPrimes + "]");
//      distinctFooPrimes.clear();

        new TimingContainer(iters, "streamtoarray2", new TimingTest() {
            @Override
            public void run() {
                FooPrime[] fooPrimeArray = testStreamToArray2(fooList);
//              lengths.add(fooPrimeArray.length);
//              distinctFooPrimes.add(fooPrimeArray[0]);
            }
        }).run();

//      System.out.println("lengths[" + lengths + "]");
//      lengths.clear();
//      System.out.println("distinctFooPrimes[" + distinctFooPrimes + "]");
//      distinctFooPrimes.clear();
    }

    private FooPrime[] testManualCopy(ArrayList<Foo> fooList) {
        FooPrime[] fooPrimeArray    = new FooPrime[fooList.size()];
        int index = -1;
        for (Foo foo: fooList) {
            ++ index;
            fooPrimeArray[index]    = new FooPrime().gamma(foo.getAlpha() + foo.getBeta());
        }
        return fooPrimeArray;
    }

    private FooPrime[] testToArray(ArrayList<Foo> fooList) {
        List<FooPrime>  fooPrimeList    = new ArrayList<>();
        for (Foo foo: fooList) {
            fooPrimeList.add(new FooPrime().gamma(foo.getAlpha() + foo.getBeta()));
        }
        return fooPrimeList.toArray(new FooPrime[fooList.size()]);
    }

    private FooPrime[] testStreamToArray(ArrayList<Foo> fooList) {
        return (FooPrime[]) fooList.stream().
                    map(it -> {
                        return new FooPrime().gamma(it.getAlpha() + it.getBeta());
                    }).
                    toArray(FooPrime[]::new);
    }

    private FooPrime[] testStreamToArray2(ArrayList<Foo> fooList) {
        return (FooPrime[]) fooList.stream().
                    map(it -> {
                        int stuff = it.getAlpha().length();
                        return new FooPrime().gamma(it.getAlpha() + it.getBeta());
                    }).
                    toArray(FooPrime[]::new);
    }

    public static FooPrime fooToFooPrime(Foo foo) {
        return new FooPrime().gamma(foo.getAlpha() + foo.getBeta());
    }

    public static class Foo {
        private String alpha;
        private String beta;

        public String getAlpha() { return alpha; }
        public String getBeta() { return beta; }

        public void setAlpha(String alpha) { this.alpha = alpha; }
        public void setBeta(String beta) { this.beta = beta; }

        public Foo alpha(String alpha) { this.alpha = alpha; return this; }
        public Foo beta(String beta) { this.beta = beta; return this; }
    }

    public static class FooPrime {
        private String gamma;

        public String getGamma() { return gamma; }

        public void setGamma(String gamma) { this.gamma = gamma; }

        public FooPrime gamma(String gamma) { this.gamma = gamma; return this; }

        @Override
        public int hashCode() {
            final int prime = 31;
            int result = 1;
            result = prime * result + ((gamma == null) ? 0 : gamma.hashCode());
            return result;
        }

        @Override
        public boolean equals(Object obj) {
            if (this == obj)
                return true;
            if (obj == null)
                return false;
            if (getClass() != obj.getClass())
                return false;
            FooPrime other = (FooPrime) obj;
            if (gamma == null) {
                if (other.gamma != null)
                    return false;
            } else if (!gamma.equals(other.gamma))
                return false;
            return true;
        }

        @Override
        public String toString() {
            return "FooPrime [gamma=" + gamma + "]";
        }
    }
}

And the base class:

package timings;

public class TimingContainer {
    private int         iterations;
    private String      label;
    private TimingTest  timingTest;

    public TimingContainer(int iterations, String label, TimingTest timingTest) {
        this.iterations = iterations;
        this.label      = label;
        this.timingTest = timingTest;
    }

    public void run() {
        long startTime  = System.nanoTime();
        for (int ctr = 0; ctr < iterations; ++ ctr) {
            timingTest.randomize();
            timingTest.run();
        }
        long    endTime = System.nanoTime();
        long    totalns = (endTime - startTime);
        System.out.println(label + ":" + (totalns / iterations) + " ns");
    }
}
like image 330
David M. Karr Avatar asked Jan 12 '17 20:01

David M. Karr


1 Answers

(Revised answer.)

Benchmarking in Java is difficult. Still, let us throw JMH at it... I ported your benchmark to JMH (see http://github.com/lemire/microbenchmarks).

These are the relevant methods...

    public FooPrime[] basicstream(BenchmarkState s) {
            return (FooPrime[]) s.fooList.stream().map(it -> {
                    return new FooPrime().gamma(it.getAlpha() + it.getBeta());
            }).toArray(FooPrime[]::new);
    }

    public FooPrime[] tweakedbasicstream(BenchmarkState s) {
            return (FooPrime[]) s.fooList.stream().map(it -> {
                    int stuff = it.getAlpha().length();
                    return new FooPrime().gamma(it.getAlpha() + it.getBeta());
            }).toArray(FooPrime[]::new);
    }

And here is the result of my run...

git clone https://github.com/lemire/microbenchmarks.git
cd microbenchmarks
mvn clean install
java -cp target/microbenchmarks-0.0.1-jar-with-dependencies.jar me.lemire.microbenchmarks.mysteries.MysteriousLambda
Benchmark                                      Mode  Samples      Score    Error  Units
m.l.m.m.MysteriousLambda.basicstream           avgt        5  17013.784 ± 46.536  ns/op
m.l.m.m.MysteriousLambda.tweakedbasicstream    avgt        5  16240.451 ± 67.884  ns/op

Oddly, it seems that the two functions do not run at exactly the same average speed, there is a fairly significant difference. And that's while using JMH, a fairly good framework for benchmarking.

I thought at first that your two pieces of code were logically equivalent, but they are not. The apparently useless length method access forces the code to throw an exception when the String object returned is null.

So it is actually closer to the following piece of code...

    @Benchmark
    public FooPrime[] nullbasicstream(BenchmarkState s) {
            return (FooPrime[]) s.fooList.stream().map(it -> {
                    if( it.getAlpha() == null) throw new NullPointerException();
                    return new FooPrime().gamma(it.getAlpha() + it.getBeta());
            }).toArray(FooPrime[]::new);
    }

And this is even faster than your tweaked function...

Benchmark                                      Mode  Samples      Score    Error  Units
m.l.m.m.MysteriousLambda.basicstream           avgt        5  17013.784 ± 46.536  ns/op
m.l.m.m.MysteriousLambda.nullbasicstream       avgt        5  15983.762 ± 92.593  ns/op
m.l.m.m.MysteriousLambda.tweakedbasicstream    avgt        5  16240.451 ± 67.884  ns/op

Why might this be?

Let us sidestep Java 8's stream programming and write the function the silly old way, with and without the null check:

    @Benchmark
    public FooPrime[] basicsum(BenchmarkState s) {
            int howmany = s.fooList.size();
            FooPrime[] answer = new FooPrime[s.fooList.size()];
            for(int k = 0; k < howmany ; ++k ) {
                    Foo x = s.fooList.get(k);
                    answer[k] = new FooPrime(x.getAlpha() + x.getBeta());
            }
            return answer;
    }

    @Benchmark
    public FooPrime[] basicsumnull(BenchmarkState s) {
            int howmany = s.fooList.size();
            FooPrime[] answer = new FooPrime[s.fooList.size()];
            for(int k = 0; k < howmany ; ++k ) {
                    Foo x = s.fooList.get(k);
                    if(x.getAlpha() == null) throw new NullPointerException();
                    answer[k] = new FooPrime(x.getAlpha() + x.getBeta());
            }
            return answer;
    }

And that is how we get the best performance...

 m.l.m.m.MysteriousLambda.basicstream                        avgt        5  17019.730 ±  61.982  ns/op
 m.l.m.m.MysteriousLambda.nullbasicstream                    avgt        5  16019.332 ±  62.831  ns/op
 m.l.m.m.MysteriousLambda.basicsum                           avgt        5  15635.474 ± 119.890  ns/op
 m.l.m.m.MysteriousLambda.basicsumnull                       avgt        5  14342.016 ± 109.958  ns/op

But the benefit of the null check remains.

Ok. Let us benchmark just the string sums, without anything else (no custom class). Let us have both the standard sum and the sum preceeded by a null check:

    @Benchmark
    public void stringsum(BenchmarkState s) {
            for(int k = 0; k < s.N; ++k) s.list3[k] = s.list1[k] + s.list2[k];
    }


    @Benchmark
    public void stringsum_withexcept(BenchmarkState s) {
            for(int k = 0; k < s.N; ++k) {
                    if(s.list1[k] == null) throw new NullPointerException();
                    s.list3[k] = s.list1[k] + s.list2[k];
            }
    }

We get that the null check slows us down...

    m.l.m.m.StringMerge.stringsum               avgt        5  27011.111 ±  4.077  ns/op
    m.l.m.m.StringMerge.stringsum_withexcept    avgt        5  28387.825 ± 82.523  ns/op
like image 197
Daniel Lemire Avatar answered Oct 28 '22 06:10

Daniel Lemire