Why does adding in an immediately invoked lambda make my JavaScript code 2x faster?

Tags:

I'm optimizing the compiler of a language to JavaScript, and found a very interesting, if not frustrating, case:

function add(n,m) {
  return n === 0 ? m : add(n - 1, m) + 1;
};
var s = 0;
for (var i = 0; i < 100000; ++i) {
  s += add(4000, 4000);
}
console.log(s);

It takes 2.3s to complete on my machine[1]. But if I make a very small change:

function add(n,m) {
  return (() => n === 0 ? m : add(n - 1, m) + 1)();
};
var s = 0;
for (var i = 0; i < 100000; ++i) {
  s += add(4000, 4000);
}
console.log(s);

It completes in 1.1s. Notice the only difference is the addition of an immediately invoked lambda, (() => ...)(), around the return of add. Why does this added call make my program two times faster?

[1] MacBook Pro 13" 2020, 2.3 GHz Quad-Core Intel Core i7, Node.js v15.3.0

640

asked Nov 27 '20 18:11

MaiaVictor

1 Answers

Interesting! From looking at the code, it seems fairly obvious that the IIFE-wrapped version should be slower, not faster: in every loop iteration, it creates a new function object and calls it (which the optimizing compiler will eventually avoid, but that doesn't kick in right away), so generally just does more work, which should be taking more time.

The explanation in this case is inlining.

A bit of background: inlining one function into another (instead of calling it) is one of the standard tricks that optimizing compilers perform in order to achieve better performance. It's a double-edged sword though: on the plus side, it avoids calling overhead, and can often enable further optimizations, such as constant propagation, or elimination of duplicate computation (see below for an example). On the negative side, it causes compilation to take longer (because the compiler does more work), and it causes more code to be generated and stored in memory (because inlining a function effectively duplicates it), and in a dynamic language like JavaScript where optimized code typically relies on guarded assumptions, it increases the risk of one of these assumptions turning out to be wrong and a large amount of optimized code having to be thrown away as a result.

Generally speaking, making perfect inlining decisions (not too much, not too little) requires predicting the future: knowing in advance how often and with which parameters the code will be executed. That is, of course, impossible, so optimizing compilers use various rules/"heuristics" to make guesses about what might be a reasonably good decision.

One rule that V8 currently has is: don't inline recursive calls.

That's why in the simpler version of your code, add will not get inlined into itself. The IIFE version essentially has two functions calling each other, which is called "mutual recursion" -- and as it turns out, this simple trick is enough to fool V8's optimizing compiler and make it sidestep its "don't inline recursive calls" rule. Instead, it happily inlines the unnamed lambda into add, and add into the unnamed lambda, and so on, until its inlining budget runs out after ~30 rounds. (Side note: "how much gets inlined" is one of the somewhat-complex heuristics and in particular takes function size into account, so whatever specific behavior we see here is indeed specific to this situation.)

In this particular scenario, where the involved functions are very small, inlining helps quite a bit because it avoids call overhead. So in this case, inlining gives better performance, even though it is a (disguised) case of recursive inlining, which in general often is bad for performance. And it does come at a cost: in the simple version, the optimizing compiler spends only 3 milliseconds compiling add, producing 562 bytes of optimized code for it. In the IIFE version, the compiler spends 30 milliseconds and produces 4318 bytes of optimized code for add. That's one reason why it's not as simple as concluding "V8 should always inline more": time and battery consumption for compiling matters, and memory consumption matters too, and what might be acceptable cost (and improve performance significantly) in a simple 10-line demo may well have unacceptable cost (and potentially even cost overall performance) in a 100,000-line app.

Now, having understood what's going on, we can get back to the "IIFEs have overhead" intuition, and craft an even faster version:

function add(n,m) {
  return add_inner(n, m);
};
function add_inner(n, m) {
  return n === 0 ? m : add(n - 1, m) + 1;
}

On my machine, I'm seeing:

simple version: 1650 ms
IIFE version: 720 ms
add_inner version: 460 ms

Of course, if you implement add(n, m) simply as return n + m, then it terminates in 2 ms -- algorithmic optimization beats anything an optimizing compiler could possibly accomplish :-)

Appendix: Example for benefits of optimization. Consider these two functions:

function Process(x) {
  return (x ** 2) + InternalDetail(x, 0, 2);
}

function InternalDetail(x, offset, power) {
  return (x + offset) ** power;
}

(Obviously, this is silly code; but let's assume it's a simplified version of something that makes sense in practice.)
When executed naively, the following steps happen:

evaluate temp1 = (x ** 2)
call InternalDetail with parameters x, 0, 2
evaluate temp2 = (x + 0)
evaluate temp3 = temp2 ** 2
return temp3 to the caller
evaluate temp4 = temp1 + temp3
return temp4.

If an optimizing compiler performs inlining, then as a first step it will get:

function Process_after_inlining(x) {
  return (x ** 2) + ( (x + 0) ** 2 );
}

which allows two simplifications: x + 0 can be folded to just x, and then the x ** 2 computation occurs twice, so the second occurrence can be replaced by reusing the result from the first:

function Process_with_optimizations(x) {
  let temp1 = x ** 2;
  return temp1 + temp1;
}

So comparing with the naive execution, we're down to 3 steps from 7:

evaluate temp1 = (x ** 2)
evaluate temp2 = temp1 + temp1
return temp2

I'm not predicting that real-world performance will go from 7 time units to 3 time units; this is just meant to give an intuitive idea of why inlining can help reduce computational load by some amount.
Footnote: to illustrate how tricky all this stuff is, consider that replacing x + 0 with just x is not always possible in JavaScript, even when the compiler knows that x is always a number: if x happens to be -0, then adding 0 to it changes it to +0, which may well be observable program behavior ;-)

answered Oct 22 '22 20:10

jmrk

Related questions
                            
                                Append created image file in formdata
                            
                                Vue @click doesn't work on an anchor tag with href present
                            
                                Does the spread operator not copy over the prototype?
                            
                                Dynamically integrate Vuetify v-stepper with Vue router
                            
                                Jest global teardown runs before tests finish?
                            
                                How to change vue/vuetify text field color conditionally
                            
                                Can this function be rewritten with a regex?
                            
                                ES6/JS assign default value if array.find returns undefined
                            
                                How to reference pdf.js library in React?
                            
                                expo sqlite use existing database
                            
                                javaScript function - why my default argument fails?
                            
                                What is the advantage of using componentDidUpdate over the setState callback?
                            
                                Javascript map function converting values to decimal
                            
                                How to auto resize the textarea to fit the content?
                            
                                Why does JSON parsing not fail on the first character for strings starting with "t"?
                            
                                Cypress expect element to contain one string or another string
                            
                                Material UI React - Module not found: Can't resolve '@material-ui/pickers'
                            
                                esbuild not bundling files
                            
                                how to swap tokens on uniswap using web3 js
                            
                                Unable to get the name of the input when using getByRole while testing with testing-library

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does adding in an immediately invoked lambda make my JavaScript code 2x faster?

Tags:

javascript

optimization

node.js

compiler-optimization

v8

MaiaVictor

People also ask

1 Answers

jmrk

Recent Activity

Donate For Us