Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript optimization with `new Function()`

While reading docs, I have found an simple optimization that greatly improves javascript performance.

Original code:

function parseRow(columns, parser) {
  var row = {};
  for (var i = 0; i < columns.length; i++) {
    row[columns[i].name] = parser.readColumnValue();
  }
}

Optimized code :

var code = 'return {\n';
columns.forEach(function(column) {
  code += '"' + column.name + '":' + 'parser.readColumnValue(),\n';
});
code += '};\n';

var parseRow = new Function('columns', 'parser', code);

Found here : https://github.com/felixge/faster-than-c
Why does it run 20% faster ?
I believe that it removes the for statement, but doesn't the forEach have the same computational cost ?

like image 965
ovi Avatar asked Sep 13 '14 12:09

ovi


2 Answers

The difference is that you are only using forEach to construct the optimized function. Once the function is created, there isn't any looping inside: the loop is unrolled and column names are hardcoded. The method is then evaled into a working function, which might even be compiled into machine code, depending on the engine. This results in two performance improvements:

  1. By removing the for loop condition check (i < columns.length) completely, there is no branching, and
  2. By hardcoding values of column[i].name into multiple statements, you removed evaluating column[i] and lookups to column.name in each step.

So after calling new Function(...) with the code passed as a String, your parseRow variable gets the reference to the following function:

function parseRow(columns, parser) {
    return {
        "columnOne": parser.readColumnValue(),
        "columnTwo": parser.readColumnValue(),
        "columnThree": parser.readColumnValue(),
        ...
    };
}

Note that there aren't any loops, branching, or other lookups in that code, except for the multiple parser.readColumnValue() calls.

Why is this possible in JavaScript?

The reason why this works so efficiently in JavaScript is because JavaScript source code in any web page needs to be interpreted or compiled by the JS engine anyway. You don't ship your webpage with compiled executables, or even (somewhat) precompiled bytecode (like Java or .NET). Every single time a new .js file is loaded, your browser will compile it from scratch before running it (well, to be precise, in modern engines it's something between interpreting and compiling, i.e. JITting).

This means that creating a working function from a string (i.e. compiling the code) during runtime is not any less efficient than having hand-written code being read from the .js file. Compare that to a C/C++ program, which is (in all reasonable cases) compiled to machine code (i.e. executable file which is as close to the CPU as you can get) before it reaches the customer.

If you wanted to do this in C++ (a sort of a self-modifying code), you would have to bundle a compiler along your app to build the code, and the cost of building this function would overweight the benefits you would get when you would finally start it. In .NET, for example, it is also not unusual for a program to emit methods or even assemblies at run time, which then get JIT compiled to machine code allowing potential performance improvements such as the one in your question.

like image 53
Groo Avatar answered Oct 27 '22 09:10

Groo


The performance gains depend a lot on the JavaScript engine, as well as on the data, that is being processed. We don't know the exact circumstances of "20% faster" (except for using node.js). It could be slower in some situations. (Edit: You would need to call the function often enough to outweigh the construction cost). Some possible reasons for the gains:

The optimized code creates an object literal. The previous version constantly assigns values to not yet existing properties. That has some cost associated with it.

row[columns[i].name] has three lookups, whereas the optimized version has none, once the function is constructed. And don't forget that row[columns[i].name] doesn't exist yet, so the lookup is more expensive. columns.length is a lookup, too.

like image 20
a better oliver Avatar answered Oct 27 '22 09:10

a better oliver