Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do neither V8 nor spidermonkey seem to unroll static loops?

Doing a small check, it looks like neither V8 nor spidermonkey unroll loops, even if it is completely obvious, how long they are (literal as condition, declared locally):

const f = () => {
  let counter = 0;
  for (let i = 0; i < 100_000_000; i++) {
    counter++;
  }
  return counter;
};

const g = () => {
  let counter = 0;
  for (let i = 0; i < 10_000_000; i += 10) {
    counter++;
    counter++;
    counter++;
    counter++;
    counter++;
    counter++;
    counter++;
    counter++;
    counter++;
    counter++;
  }
  return counter;
}

let start = performance.now();
f();
let mid = performance.now();
g();
let end = performance.now();

console.log(
  `f took ${(mid - start).toFixed(2)}ms, g took ${(end - mid).toFixed(2)}ms, ` +
  `g was ${((mid - start)/(end - mid)).toFixed(2)} times faster.`
);

Is there any reason for this? They perform considerably more complex optimizations. Are standard for-loops just that uncommon in javascript, that it's not worth it?


Edit: Just as a note: one might argue, that perhaps optimization is delayed. This doesn't seem to be the case, although i am not an expert here. I used node --allow-natives-syntax --trace-deopt, performed optimization manually, and observed no deoptimization happening (snippet for collapsing, not actually runnable in a browser):

const { performance } = require('perf_hooks');

const f = () => {
  let counter = 0;
  for (let i = 0; i < 100_000_000; i++) {
    counter++;
  }
  return counter;
};
// collect metadata and optimize
f(); f();
%OptimizeFunctionOnNextCall(f);
f();

const start = performance.now();
f();
console.log(performance.now() - start);

Done with both the normal and unrolled version, the same effect prevails.

like image 514
Doofus Avatar asked Feb 06 '21 20:02

Doofus


2 Answers

(V8 developer here.)

TL;DR: because it's rarely worth it for real-world code.

Loop unrolling, like other optimizations that increase code size (such as inlining), is a double-edged sword. Yes, it can help; in particular it often helps for tiny toy examples like the one posted here. But it can also hurt performance, most obviously because it increases the amount of work that the compiler has to do (and hence the time it takes to do that work), but also through secondary effects like bigger code benefitting less from the CPU's caching efforts.

V8's optimizing compiler actually does like to unroll the first iteration of loops. Also, as it happens, we currently have an ongoing project to unroll more loops; the current status is that it sometimes helps and sometimes hurts, so we're still fine-tuning the heuristics for when it should kick in, and when it shouldn't. This difficulty also indicates that for real-world JavaScript, the benefits will typically be quite small.

It doesn't matter whether it's a "standard for-loop" or not; any loop can be unrolled in theory. It just happens to be the case that aside from microbenchmarks, loop unrolling tends to make little difference: there isn't all that much overhead to just doing another iteration, so if the loop body does more than counter++, there's not a lot to be gained from avoiding the per-iteration overhead. And, fwiw, this per-iteration overhead is not what your test is measuring: the repeated increments all get folded, so what you're really comparing here is 100M iterations of counter += 1 against 10M iterations of counter += 10.

So this is one of many examples of misleading microbenchmarks trying to trick us into drawing the wrong conclusions ;-)

like image 53
jmrk Avatar answered Nov 03 '22 08:11

jmrk


I recommend you read this answer as it explains it quite clearly. In a nutshell, unrolling won't mean that the code will run faster.

For example, if, instead of a simple counter++, you had a function call (taken from the linked answer):

function one() {
  // something complex running here, in this case a very long comment:
  // bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
  
  return 1;
}

for (let i = 0; i < 1000; ++i) {
  counter += one();
}

If the function is short, on-stack replacement as well as inline the function will make the unrolling code faster, however if the function is long, the loop is actually faster (all examples again, taken from the answer I linked).

  counter += one();
  counter += one();
  ...

Now, from my personal journey back in the day from creating a simple compiler back in college using Assembly language (already optimized by the processor), and all the way up to C/C++ (which by themselves can already produce unbelievably efficient ASM code), and walking my way up to higher level languages such as PHP and Javascript:

My take is that the people in charge of optimization have to do a lot of heuristics and would most likely be interested in real-life code that can produce real-life results.

Now, I am not in the position to determine whether it's more common to do arithmetic in a for loop rather than calling a function, but my gut tells me that it's unlikely that for loops with simple arithmetic matter much in the huge ecosystems that browsers have become nowadays. Then again it's a great exercise to learn and dive deeper.

like image 21
Yuan-Hao Chiang Avatar answered Nov 03 '22 07:11

Yuan-Hao Chiang