Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is replaceAll performance secret? [HTML escape]

I spent some time looking best way to escape html string and found some discussions on that: discussion 1 discussion 2. It leads me to replaceAll function. Then I did performance tests and tried to find solution achieving similar speed with no success :(

Here is my final test case set. I found it on net and expand with my tries (4 cases at bottom) and still can not reach replaceAll() performance.

What is secret witch makes replaceAll() solution so speedy?

Greets!

Code snippets:

String.prototype.replaceAll = function(str1, str2, ignore) 
{
   return this.replace(new RegExp(str1.replace(/([\/\,\!\\\^\$\{\}\[\]\(\)\.\*\+\?\|\<\>\-\&])/g,"\\$&"),(ignore?"gi":"g")),(typeof(str2)=="string")?str2.replace(/\$/g,"$$$$"):str2);
};

credits for qwerty

Fastest case so far:

html.replaceAll('&', '&amp;').replaceAll('"', '&quot;').replaceAll("'", '&#39;').replaceAll('<', '&lt;').replaceAll('>', '&gt;');
like image 282
Saram Avatar asked Jul 03 '13 07:07

Saram


People also ask

What is replaceAll in javascript?

The replaceAll() method returns a new string with all matches of a pattern replaced by a replacement . The pattern can be a string or a RegExp , and the replacement can be a string or a function to be called for each match. The original string is left unchanged.


2 Answers

Finally i found it! Thanks Jack for pointing me on jsperf specific

I should note that the test results are strange; when .replaceAll() is defined inside Benchmark.prototype.setup it runs twice as fast compared to when it's defined globally (i.e. inside a tag). I'm still not sure why that is, but it definitely must be related to how jsperf itself works.

The answer is:

replaceAll - this reach jsperf limit/bug, caused by special sequence "\\$&", so results was wrong.

compile() - when called with no argument it changes regexp definition to /(?:). I dont know if it is bug or something, but performance result was crappy after it was called.

Here is my result safe tests.

Finally I prepared proper test cases.

The result is, that for HTML escape best way it to use native DOM based solution, like:

document.createElement('div').appendChild(document.createTextNode(html)).parentNode.innerHTML

or if you repeat it many times you can do it with once prepared variables:

//prepare variables
var DOMtext = document.createTextNode("test");
var DOMnative = document.createElement("span");
DOMnative.appendChild(DOMtext);

//main work for each case
function HTMLescape(html){
  DOMtext.nodeValue = html;
  return DOMnative.innerHTML
}

Thank you all for collaboration & posting comments and directions.

jsperf bug description

The String.prototype.replaceAll was defined as followed:

function (str1, str2, ignore) {
  return this.replace(new RegExp(str1.replace(repAll, "\\#{setup}"), (ignore ? "gi" : "g")), (typeof(str2) == "string") ? str2.replace(/\$/g, "$$") : str2);
}
like image 135
Saram Avatar answered Oct 14 '22 21:10

Saram


As far as performance goes, I find that the below function is as good as it gets:

String.prototype.htmlEscape = function() {
    var amp_re = /&/g, sq_re = /'/g, quot_re = /"/g, lt_re = /</g, gt_re = />/g;

    return function() {
        return this
          .replace(amp_re, '&amp;')
          .replace(sq_re, '&#39;')
          .replace(quot_re, '&quot;')
          .replace(lt_re, '&lt;')
          .replace(gt_re, '&gt;');
    }
}();

It initializes the regular expressions and returns a closure that actually performs the replacement.

Performance test

I should note that the test results are strange; when .replaceAll() is defined inside Benchmark.prototype.setup it runs twice as fast compared to when it's defined globally (i.e. inside a <script> tag). I'm still not sure why that is, but it definitely must be related to how jsperf itself works.

Using RegExp.compile()

I wanted to avoid using a deprecated function, mostly because this kind of performance should be done automatically by modern browsers. Here's a version with compiled expressions:

String.prototype.htmlEscape2 = function() {
    var amp_re = /&/g, sq_re = /'/g, quot_re = /"/g, lt_re = /</g, gt_re = />/g;

    if (RegExp.prototype.compile) {
        amp_re.compile();
        sq_re.compile();
        quot_re.compile();
        lt_re.compile();
        gt_re.compile();
    }

    return function() {
        return this
          .replace(amp_re, '&amp;')
          .replace(sq_re, '&#39;')
          .replace(quot_re, '&quot;')
          .replace(lt_re, '&lt;')
          .replace(gt_re, '&gt;');
    }
}

Doing so blows everything else out of the water!

Performance test

The reason why .compile() gives such a performance boost is because when you compile a global expression, e.g. /a/g it gets converted to /(?:)/ (on Chrome), which renders it useless.

If compilation can't be done, a browser should throw an error instead of silently destroying it.

like image 2
Ja͢ck Avatar answered Oct 14 '22 22:10

Ja͢ck