In JavaScript (server side nodejs) I'm writing a program which generates xml as output.
I am building the xml by concatenating a string:
str += '<' + key + '>'; str += value; str += '</' + key + '>';
The problem is: What if value
contains characters like '&'
, '>'
or '<'
? What's the best way to escape those characters?
or is there any javascript library around which can escape XML entities?
XML escape characters There are only five: " " ' ' < < > > & & Escaping characters depends on where the special character is used. The examples can be validated at the W3C Markup Validation Service.
Using the Escape Character ( \ ) We can use the backslash ( \ ) escape character to prevent JavaScript from interpreting a quote as the end of the string. The syntax of \' will always be a single quote, and the syntax of \" will always be a double quote, without any fear of breaking the string.
The only illegal characters are & , < and > (as well as " or ' in attributes, depending on which character is used to delimit the attribute value: attr="must use " here, ' is allowed" and attr='must use ' here, " is allowed' ). They're escaped using XML entities, in this case you want & for & .
HTML encoding is simply replacing &
, "
, '
, <
and >
chars with their entity equivalents. Order matters, if you don't replace the &
chars first, you'll double encode some of the entities:
if (!String.prototype.encodeHTML) { String.prototype.encodeHTML = function () { return this.replace(/&/g, '&') .replace(/</g, '<') .replace(/>/g, '>') .replace(/"/g, '"') .replace(/'/g, '''); }; }
As @Johan B.W. de Vries pointed out, this will have issues with the tag names, I would like to clarify that I made the assumption that this was being used for the value
only
Conversely if you want to decode HTML entities1, make sure you decode &
to &
after everything else so that you don't double decode any entities:
if (!String.prototype.decodeHTML) { String.prototype.decodeHTML = function () { return this.replace(/'/g, "'") .replace(/"/g, '"') .replace(/>/g, '>') .replace(/</g, '<') .replace(/&/g, '&'); }; }
1 just the basics, not including ©
to ©
or other such things
As far as libraries are concerned. Underscore.js (or Lodash if you prefer) provides an _.escape
method to perform this functionality.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With