I am working to improve performance of a function that takes an XML string and replaces certain characters (encoding) before returning the string. The function gets pounded, so it is important to run as quickly as possible. The USUAL case is that none of the characters are present - so I would like to especially optimize for that. As you will see in the sample code, the strings to be replaced are short, and relatively few. The source strings will often be short (e.g. 10-20 characters) but could be longer (e.g. 200 characters or so).
So far I have ensured that the regexes are precompiled and I've eliminated nested functions which slowed down operation (partial milliseconds matter at this point.)
var objXMLToString = new XMLToStringClass();
function XMLToStringClass(){
this.tester= /\\34|\\39|\\62|\\60|\\13\\10|\\09|\\92|&/;
this.replacements=[];
var self=this;
function init(){
var re = new regexReplacePair(/\\34/g,'"');
self.replacements.push(re);
re = new regexReplacePair(/\\39/g,"'");
self.replacements.push(re);
re = new regexReplacePair(/\\62/g,">");
self.replacements.push(re);
re = new regexReplacePair(/\\60/g,"<");
self.replacements.push(re);
re = new regexReplacePair(/\\13\\10/g,"\n");
self.replacements.push(re);
re = new regexReplacePair(/\\09/g,"\t");
self.replacements.push(re);
re = new regexReplacePair(/\\92/g,"\\");
self.replacements.push(re);
re = new regexReplacePair(/\&/g,"&");
self.replacements.push(re);
}
init();
}
function regexReplacePair(regex,replacementString){
this.regex = regex;
this.replacement = replacementString;
}
String.prototype.XMLToString = function() {
newString=this;
if(objXMLToString.tester.test(this)){
for (var x = 0;x<objXMLToString.replacements.length;x++){
newString=newString.replace(objXMLToString.replacements[x].regex,objXMLToString.replacements[x].replacement);
}
return newString;
}
return this;
}
String.replace
functions will be better? indexOf
be quicker than a regex for this dataset? I have benchmarked your original version, Ates Gorals hash, my improved hash, a version using switch, and the simple solution. The winner? The simple solution!
With matching data (string of 85 chars)
original simple hash switch ag-hash
FF3 194 188 250 240 424
IE7 188 172 641 906 2203
Chrome1 161 156 165 165 225
Opera9 640 625 531 515 734
With non matching data (string of 85 chars):
original simple hash switch ag-hash
FF3 39 4 34 34 39
IE7 125 15 125 125 156
Chrome1 45 2 54 54 57
Opera9 156 15 156 156 156
(tested on my window xp laptop, 1.7GHz, ymmv)
The simple solution:
function XMLToString(str) {
return (str.indexOf("\\")<0 && str.indexOf("&")<0) ? str :
str
.replace(/\\34/g,'"')
.replace(/\\39/g,"'")
.replace(/\\62/g,">")
.replace(/\\60/g,"<")
.replace(/\\13\\10/g,"\n")
.replace(/\\09/g,"\t")
.replace(/\\92/g,"\\")
.replace(/\&/g,"&");
}
Explanation:
First there is a check if there is a backslash or ampersand (it was faster to use indexOf instead of a regexp in all browsers). If there isn't the string is returned untouched, otherwise all the replacements are executed. In this case it don't makes mush difference to cache the regexps. I tried with two arrays, one with the regexps and one with the replacements, but it was not a big difference.
In your original version you have a test for all combinations just to detect if you should do the replacements or not. That is expensive since the regexp engine has to compare every position with every combination. I simplified it.
I improved Ates Gorals by moving the hash object outside (so it wasn't created and thrown away for every call to the inner function), moving the inner function outside so it can be reused instead of thrown away.
UPDATE 1 Bugfix: moved a parenthesis in the ampersand test.
UPDATE 2
One of your comments made me believe that you encode the string yourself, and if that is the case I suggest that you switch encoding to a standard one, so you can use the built in functions.
Instead of "\dd" where dd is a decimal number, use "%xx" where xx is the hexadecimal number. Then you can use the built in decodeURIComponent that is much faster and as a bonus can decode any characters, including unicode.
matching non match
FF3 44 3
IE7 93 16
Chrome1 132 1
Opera9 109 16
.
function XMLToString_S1(str) {
return (str.indexOf("%")<0) ? str : decodeURIComponent(str).replace(/\x0D\x0A/g,"\n")
}
So instead of having a string like "\09test \60\34string\34\62\13\10\" you have a string like "%09test %3c%22string%22%3e%0d%0a".
You can use a hash lookup:
str.replace(
/(\\34|\\39|\\62|\\60|\\13\\10|\\09|\\92|&)/g,
function () {
return {
"\\34": "\"",
"\\39": "'",
//...
"&": "&"
}[arguments(1)];
}
);
Or you insists on extending the prototype:
var hash = {
"\\34": "\"",
"\\39": "'",
//...
"&": "&"
};
String.prototype.XMLToString = function () {
return this.replace(
/(\\34|\\39|\\62|\\60|\\13\\10|\\09|\\92|&)/g,
function () { return hash[arguments(1)]; }
}
);
This might be faster (need to benchmark):
String.prototype.XMLToString = function () {
var s = this;
for (var r in hash) {
s = s.split(r).join(hash[r]);
}
return s;
);
You can generate your regex from the hash:
var arr = [];
for (var r in hash) {
arr.push(r);
}
var re = new RegExp("(" + arr.join("|") + ")", "g");
and then use it as:
s = s.replace(re, function () { ... });
Here's my stab at refactoring your code
Here's the code
var objXMLToString = {
tester: /\\34|\\39|\\62|\\60|\\13\\10|\\09|\\92|&/
,replacements: [
[/\\34/g,'"']
,[/\\39/g,"'"]
,[/\\62/g,">"]
,[/\\60/g,"<"]
,[/\\13\\10/g,"\n"]
,[/\\09/g,"\t"]
,[/\\92/g,"\\"]
,[/\&/g,"&"]
]
}
String.prototype.XMLToString = function()
{
var newString = this;
if ( objXMLToString.tester.test( this ) )
{
var x = 0, replacer;
while ( replacer = objXMLToString.replacements[x++] )
{
newString = newString.replace( replacer[0], replacer[1] );
}
}
return newString;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With