Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does adding in an immediately invoked lambda make my JavaScript code 2x faster?

I'm optimizing the compiler of a language to JavaScript, and found a very interesting, if not frustrating, case:

function add(n,m) {
  return n === 0 ? m : add(n - 1, m) + 1;
};
var s = 0;
for (var i = 0; i < 100000; ++i) {
  s += add(4000, 4000);
}
console.log(s);

It takes 2.3s to complete on my machine[1]. But if I make a very small change:

function add(n,m) {
  return (() => n === 0 ? m : add(n - 1, m) + 1)();
};
var s = 0;
for (var i = 0; i < 100000; ++i) {
  s += add(4000, 4000);
}
console.log(s);

It completes in 1.1s. Notice the only difference is the addition of an immediately invoked lambda, (() => ...)(), around the return of add. Why does this added call make my program two times faster?

[1] MacBook Pro 13" 2020, 2.3 GHz Quad-Core Intel Core i7, Node.js v15.3.0

like image 640
MaiaVictor Avatar asked Nov 27 '20 18:11

MaiaVictor


People also ask

Why use immediately invoked function JavaScript?

An IIFE (Immediately Invoked Function Expression) can be used for avoiding the variable hoisting from within the blocks. It allows the public access to methods while retaining the privacy for variables defined in the function. IIFE is a design pattern that is also known as the Self-Executing Anonymous Function.

Can lambda function be immediately invoked?

You can invoke Lambda functions directly using the Lambda console, a function URL HTTP(S) endpoint, the Lambda API, an AWS SDK, the AWS Command Line Interface (AWS CLI), and AWS toolkits.

Are lambda functions fast?

Lambda functions are inline functions and thus execute comparatively faster. Many times lambda functions make code much more readable by avoiding the logical jumps caused by function calls.

What is immediately invoked function in JavaScript with example?

An Immediate-Invoked Function Expression (IIFE) is a function that is executed instantly after it's defined. This pattern has been used to alias global variables, make variables and functions private and to ensure asynchronous code in loops are executed correctly.


1 Answers

Interesting! From looking at the code, it seems fairly obvious that the IIFE-wrapped version should be slower, not faster: in every loop iteration, it creates a new function object and calls it (which the optimizing compiler will eventually avoid, but that doesn't kick in right away), so generally just does more work, which should be taking more time.

The explanation in this case is inlining.

A bit of background: inlining one function into another (instead of calling it) is one of the standard tricks that optimizing compilers perform in order to achieve better performance. It's a double-edged sword though: on the plus side, it avoids calling overhead, and can often enable further optimizations, such as constant propagation, or elimination of duplicate computation (see below for an example). On the negative side, it causes compilation to take longer (because the compiler does more work), and it causes more code to be generated and stored in memory (because inlining a function effectively duplicates it), and in a dynamic language like JavaScript where optimized code typically relies on guarded assumptions, it increases the risk of one of these assumptions turning out to be wrong and a large amount of optimized code having to be thrown away as a result.

Generally speaking, making perfect inlining decisions (not too much, not too little) requires predicting the future: knowing in advance how often and with which parameters the code will be executed. That is, of course, impossible, so optimizing compilers use various rules/"heuristics" to make guesses about what might be a reasonably good decision.

One rule that V8 currently has is: don't inline recursive calls.

That's why in the simpler version of your code, add will not get inlined into itself. The IIFE version essentially has two functions calling each other, which is called "mutual recursion" -- and as it turns out, this simple trick is enough to fool V8's optimizing compiler and make it sidestep its "don't inline recursive calls" rule. Instead, it happily inlines the unnamed lambda into add, and add into the unnamed lambda, and so on, until its inlining budget runs out after ~30 rounds. (Side note: "how much gets inlined" is one of the somewhat-complex heuristics and in particular takes function size into account, so whatever specific behavior we see here is indeed specific to this situation.)

In this particular scenario, where the involved functions are very small, inlining helps quite a bit because it avoids call overhead. So in this case, inlining gives better performance, even though it is a (disguised) case of recursive inlining, which in general often is bad for performance. And it does come at a cost: in the simple version, the optimizing compiler spends only 3 milliseconds compiling add, producing 562 bytes of optimized code for it. In the IIFE version, the compiler spends 30 milliseconds and produces 4318 bytes of optimized code for add. That's one reason why it's not as simple as concluding "V8 should always inline more": time and battery consumption for compiling matters, and memory consumption matters too, and what might be acceptable cost (and improve performance significantly) in a simple 10-line demo may well have unacceptable cost (and potentially even cost overall performance) in a 100,000-line app.


Now, having understood what's going on, we can get back to the "IIFEs have overhead" intuition, and craft an even faster version:

function add(n,m) {
  return add_inner(n, m);
};
function add_inner(n, m) {
  return n === 0 ? m : add(n - 1, m) + 1;
}

On my machine, I'm seeing:

  • simple version: 1650 ms
  • IIFE version: 720 ms
  • add_inner version: 460 ms

Of course, if you implement add(n, m) simply as return n + m, then it terminates in 2 ms -- algorithmic optimization beats anything an optimizing compiler could possibly accomplish :-)


Appendix: Example for benefits of optimization. Consider these two functions:

function Process(x) {
  return (x ** 2) + InternalDetail(x, 0, 2);
}

function InternalDetail(x, offset, power) {
  return (x + offset) ** power;
}

(Obviously, this is silly code; but let's assume it's a simplified version of something that makes sense in practice.)
When executed naively, the following steps happen:

  1. evaluate temp1 = (x ** 2)
  2. call InternalDetail with parameters x, 0, 2
  3. evaluate temp2 = (x + 0)
  4. evaluate temp3 = temp2 ** 2
  5. return temp3 to the caller
  6. evaluate temp4 = temp1 + temp3
  7. return temp4.

If an optimizing compiler performs inlining, then as a first step it will get:

function Process_after_inlining(x) {
  return (x ** 2) + ( (x + 0) ** 2 );
}

which allows two simplifications: x + 0 can be folded to just x, and then the x ** 2 computation occurs twice, so the second occurrence can be replaced by reusing the result from the first:

function Process_with_optimizations(x) {
  let temp1 = x ** 2;
  return temp1 + temp1;
}

So comparing with the naive execution, we're down to 3 steps from 7:

  1. evaluate temp1 = (x ** 2)
  2. evaluate temp2 = temp1 + temp1
  3. return temp2

I'm not predicting that real-world performance will go from 7 time units to 3 time units; this is just meant to give an intuitive idea of why inlining can help reduce computational load by some amount.
Footnote: to illustrate how tricky all this stuff is, consider that replacing x + 0 with just x is not always possible in JavaScript, even when the compiler knows that x is always a number: if x happens to be -0, then adding 0 to it changes it to +0, which may well be observable program behavior ;-)

like image 90
jmrk Avatar answered Oct 22 '22 20:10

jmrk