I've seen a lot of questions here about Java lambdas performance, but most of them go like "Lambdas are slightly faster, but become slower when using closures" or "Warm-up vs execution times are different" or other such things. However, I hit a rather strange thing here. Consider this LeetCode problem: <blockquote> Given a set of non-overlapping intervals, insert a new interval into the intervals (merge if necessary). You may assume that the intervals were initially sorted according to their start times. </blockquote> The problem was tagged hard, so I assumed that a linear approach is not what they want there. So I decided to come up with a clever way to combine binary search with modifications to the input list. Now the problem is not very clear on modifying the input list—it says "insert", even though the signature requires to return a reference to list, but never mind that for now. Here's the full code, but only the first few lines are relevant to this question. I'm keeping the rest here just so that anyone can try it: <pre class="prettyprint"><code>public List<Interval> insert(List<Interval> intervals, Interval newInterval) { int start = Collections.binarySearch(intervals, newInterval, (i1, i2) -> Integer.compare(i1.start, i2.start)); int skip = start >= 0 ? start : -start - 1; int end = Collections.binarySearch(intervals.subList(skip, intervals.size()), new Interval(newInterval.end, 0), (i1, i2) -> Integer.compare(i1.start, i2.start)); if (end >= 0) { end += skip; // back to original indexes } else { end -= skip; // ditto } int newStart = newInterval.start; int headEnd; if (-start - 2 >= 0) { Interval prev = intervals.get(-start - 2); if (prev.end < newInterval.start) { // the new interval doesn't overlap the one before the insertion point headEnd = -start - 1; } else { newStart = prev.start; headEnd = -start - 2; } } else if (start >= 0) { // merge the first interval headEnd = start; } else { // start == -1, insertion point = 0 headEnd = 0; } int newEnd = newInterval.end; int tailStart; if (-end - 2 >= 0) { // merge the end with the previous interval newEnd = Math.max(newEnd, intervals.get(-end - 2).end); tailStart = -end - 1; } else if (end >= 0) { newEnd = intervals.get(end).end; tailStart = end + 1; } else { // end == -1, insertion point = 0 tailStart = 0; } intervals.subList(headEnd, tailStart).clear(); intervals.add(headEnd, new Interval(newStart, newEnd)); return intervals; } </code></pre> This worked fine and got accepted, but with 80 ms runtime, while most solutions were 4-5 ms and some 18-19 ms. When I looked them up, they were all linear and very primitive. Not something one would expect from a problem tagged "hard". But here comes the question: my solution is also linear at worst case (because add/clear operations are linear time). Why is it that slower? And then I did this: <pre class="prettyprint"><code> Comparator<Interval> comparator = new Comparator<Interval>() { @Override public int compare(Interval i1, Interval i2) { return Integer.compare(i1.start, i2.start); } }; int start = Collections.binarySearch(intervals, newInterval, comparator); int skip = start >= 0 ? start : -start - 1; int end = Collections.binarySearch(intervals.subList(skip, intervals.size()), new Interval(newInterval.end, 0), comparator); </code></pre> From 80 ms down to 4 ms! What's going on here? Unfortunately I have no idea what kind of tests LeetCode runs or under what environment, but still, isn't 20 times too much?

You are obviously encountering the first-time initialization overhead of lambda expressions. As already mentioned in the comments, the classes for lambda expressions are generated at runtime rather than being loaded from your class path. However, being generated isn’t the cause for the slowdown. After all, generating a class having a simple structure can be even faster than loading the same bytes from an external source. And the inner class has to be loaded too. But when the application hasn’t used lambda expressions before¹, even the framework for generating the lambda classes has to be loaded (Oracle’s current implementation uses ASM under the hood). This is the actual cause of the slowdown, loading and initialization of a dozen internally used classes, not the lambda expression itself². You can easily verify this. In your current code using lambda expressions, you have two identical expressions <code>(i1, i2) -> Integer.compare(i1.start, i2.start)</code>. The current implementation doesn’t recognize this (actually, the compiler doesn’t provide a hint neither). So here, two lambda instances, having even different classes, are generated. You can refactor the code to have only one comparator, similar to your inner class variant: <pre class="prettyprint"><code>final Comparator<? super Interval> comparator = (i1, i2) -> Integer.compare(i1.start, i2.start); int start = Collections.binarySearch(intervals, newInterval, comparator); int skip = start >= 0 ? start : -start - 1; int end = Collections.binarySearch(intervals.subList(skip, intervals.size()), new Interval(newInterval.end, 0), comparator); </code></pre> You won’t notice any significant performance difference, as it’s not the number of lambda expressions that matters, but just the class loading and initialization of the framework, which happens exactly once. You can even max it out by inserting additional lambda expressions like <pre class="prettyprint"><code>final Comparator<? super Interval> comparator1 = (i1, i2) -> Integer.compare(i1.start, i2.start); final Comparator<? super Interval> comparator2 = (i1, i2) -> Integer.compare(i1.start, i2.start); final Comparator<? super Interval> comparator3 = (i1, i2) -> Integer.compare(i1.start, i2.start); final Comparator<? super Interval> comparator4 = (i1, i2) -> Integer.compare(i1.start, i2.start); final Comparator<? super Interval> comparator5 = (i1, i2) -> Integer.compare(i1.start, i2.start); </code></pre> without seeing any slowdown. It’s really the initial overhead of the very first lambda expression of the entire runtime you are noticing here. Since Leetcode itself apparently doesn’t use lambda expressions before entering your code, whose execution time gets measured, this overhead adds to your execution time here. See also “How will Java lambda functions be compiled?” and “Does a lambda expression create an object on the heap every time it's executed?” ¹ This implies that JDK code that will be executed before handing control over to your application doesn’t use lambda expressions itself. Since this code stems from times before the introduction of lambda expressions, this is usually the case. With newer JDKs, modular software will be initialized by different, newer code, which seems to use lambda expressions, so the initialization of the runtime facility can’t be measured within the application anymore in these setups. ² The initialization time has been reduced significantly in newer JDKs. There are different possible causes, general performance improvements, dedicated lambda optimizations, or both. Improving initialization time in general, is an issue that the JDK developers did not forget.

Java lambdas 20 times slower than anonymous classes

Tags:

java

performance

algorithm

lambda

I've seen a lot of questions here about Java lambdas performance, but most of them go like "Lambdas are slightly faster, but become slower when using closures" or "Warm-up vs execution times are different" or other such things.

However, I hit a rather strange thing here. Consider this LeetCode problem:

Given a set of non-overlapping intervals, insert a new interval into the intervals (merge if necessary).

You may assume that the intervals were initially sorted according to their start times.

The problem was tagged hard, so I assumed that a linear approach is not what they want there. So I decided to come up with a clever way to combine binary search with modifications to the input list. Now the problem is not very clear on modifying the input list—it says "insert", even though the signature requires to return a reference to list, but never mind that for now. Here's the full code, but only the first few lines are relevant to this question. I'm keeping the rest here just so that anyone can try it:

public List<Interval> insert(List<Interval> intervals, Interval newInterval) {     int start = Collections.binarySearch(intervals, newInterval,                                          (i1, i2) -> Integer.compare(i1.start, i2.start));     int skip = start >= 0 ? start : -start - 1;     int end = Collections.binarySearch(intervals.subList(skip, intervals.size()),                                        new Interval(newInterval.end, 0),                                        (i1, i2) -> Integer.compare(i1.start, i2.start));     if (end >= 0) {         end += skip; // back to original indexes     } else {         end -= skip; // ditto     }     int newStart = newInterval.start;     int headEnd;     if (-start - 2 >= 0) {         Interval prev = intervals.get(-start - 2);         if (prev.end < newInterval.start) {             // the new interval doesn't overlap the one before the insertion point             headEnd = -start - 1;         } else {             newStart = prev.start;             headEnd = -start - 2;         }     } else if (start >= 0) {         // merge the first interval         headEnd = start;     } else { // start == -1, insertion point = 0         headEnd = 0;     }     int newEnd = newInterval.end;     int tailStart;     if (-end - 2 >= 0) {         // merge the end with the previous interval         newEnd = Math.max(newEnd, intervals.get(-end - 2).end);         tailStart = -end - 1;     } else if (end >= 0) {         newEnd = intervals.get(end).end;         tailStart = end + 1;     } else { // end == -1, insertion point = 0         tailStart = 0;     }     intervals.subList(headEnd, tailStart).clear();     intervals.add(headEnd, new Interval(newStart, newEnd));     return intervals; }

This worked fine and got accepted, but with 80 ms runtime, while most solutions were 4-5 ms and some 18-19 ms. When I looked them up, they were all linear and very primitive. Not something one would expect from a problem tagged "hard".

But here comes the question: my solution is also linear at worst case (because add/clear operations are linear time). Why is it that slower? And then I did this:

    Comparator<Interval> comparator = new Comparator<Interval>() {         @Override         public int compare(Interval i1, Interval i2) {             return Integer.compare(i1.start, i2.start);         }     };     int start = Collections.binarySearch(intervals, newInterval, comparator);     int skip = start >= 0 ? start : -start - 1;     int end = Collections.binarySearch(intervals.subList(skip, intervals.size()),                                        new Interval(newInterval.end, 0),                                        comparator);

From 80 ms down to 4 ms! What's going on here? Unfortunately I have no idea what kind of tests LeetCode runs or under what environment, but still, isn't 20 times too much?

687

asked Jan 04 '16 05:01

Sergei Tachenov

1 Answers

You are obviously encountering the first-time initialization overhead of lambda expressions. As already mentioned in the comments, the classes for lambda expressions are generated at runtime rather than being loaded from your class path.

However, being generated isn’t the cause for the slowdown. After all, generating a class having a simple structure can be even faster than loading the same bytes from an external source. And the inner class has to be loaded too. But when the application hasn’t used lambda expressions before¹, even the framework for generating the lambda classes has to be loaded (Oracle’s current implementation uses ASM under the hood). This is the actual cause of the slowdown, loading and initialization of a dozen internally used classes, not the lambda expression itself².

You can easily verify this. In your current code using lambda expressions, you have two identical expressions (i1, i2) -> Integer.compare(i1.start, i2.start). The current implementation doesn’t recognize this (actually, the compiler doesn’t provide a hint neither). So here, two lambda instances, having even different classes, are generated. You can refactor the code to have only one comparator, similar to your inner class variant:

final Comparator<? super Interval> comparator   = (i1, i2) -> Integer.compare(i1.start, i2.start); int start = Collections.binarySearch(intervals, newInterval, comparator); int skip = start >= 0 ? start : -start - 1; int end = Collections.binarySearch(intervals.subList(skip, intervals.size()),                                    new Interval(newInterval.end, 0),                                    comparator);

You won’t notice any significant performance difference, as it’s not the number of lambda expressions that matters, but just the class loading and initialization of the framework, which happens exactly once.

You can even max it out by inserting additional lambda expressions like

final Comparator<? super Interval> comparator1     = (i1, i2) -> Integer.compare(i1.start, i2.start); final Comparator<? super Interval> comparator2     = (i1, i2) -> Integer.compare(i1.start, i2.start); final Comparator<? super Interval> comparator3     = (i1, i2) -> Integer.compare(i1.start, i2.start); final Comparator<? super Interval> comparator4     = (i1, i2) -> Integer.compare(i1.start, i2.start); final Comparator<? super Interval> comparator5     = (i1, i2) -> Integer.compare(i1.start, i2.start);

without seeing any slowdown. It’s really the initial overhead of the very first lambda expression of the entire runtime you are noticing here. Since Leetcode itself apparently doesn’t use lambda expressions before entering your code, whose execution time gets measured, this overhead adds to your execution time here.

See also “How will Java lambda functions be compiled?” and “Does a lambda expression create an object on the heap every time it's executed?”

¹ This implies that JDK code that will be executed before handing control over to your application doesn’t use lambda expressions itself. Since this code stems from times before the introduction of lambda expressions, this is usually the case. With newer JDKs, modular software will be initialized by different, newer code, which seems to use lambda expressions, so the initialization of the runtime facility can’t be measured within the application anymore in these setups.

² The initialization time has been reduced significantly in newer JDKs. There are different possible causes, general performance improvements, dedicated lambda optimizations, or both. Improving initialization time in general, is an issue that the JDK developers did not forget.

answered Sep 20 '22 15:09

Holger

Related questions
                            
                                How to use an array list in Java?
                            
                                How do I configure and communicate with a serial port? [closed]
                            
                                From Maven, how do I run a class that lives under src/test/java?
                            
                                Could not determine java version from '9.0.1'
                            
                                What does 'URI has an authority component' mean?
                            
                                Spring hibernate template when to use and why?
                            
                                IntelliJ IDEA underlines variables when using += in JAVA
                            
                                Wicket vs Vaadin
                            
                                Java: static field in abstract class
                            
                                Hibernate - Foreign keys instead of Entities
                            
                                Store run configuration with project in Eclipse
                            
                                java.lang.IllegalMonitorStateException: object not locked by thread before wait()?
                            
                                Remove all unused classes,methods from Android Studio project
                            
                                jstack - well-known file is not secure
                            
                                What does Thread Affinity mean?
                            
                                Tomcat request timeout
                            
                                Is it possible to generate a XSD from a JAXB-annotated class?
                            
                                Maven project with JavaFX (with jar file in `lib`)
                            
                                Run Logback in Debug
                            
                                Overriding synchronized methods in Java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With