Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to escape xml entities in javascript?

Tags:

javascript

In JavaScript (server side nodejs) I'm writing a program which generates xml as output.

I am building the xml by concatenating a string:

str += '<' + key + '>'; str += value; str += '</' + key + '>'; 

The problem is: What if value contains characters like '&', '>' or '<'? What's the best way to escape those characters?

or is there any javascript library around which can escape XML entities?

like image 635
Zo72 Avatar asked Oct 27 '11 16:10

Zo72


People also ask

How do I escape an XML file?

XML escape characters There are only five: " &quot; ' &apos; < &lt; > &gt; & &amp; Escaping characters depends on where the special character is used. The examples can be validated at the W3C Markup Validation Service.

How do you escape in JavaScript?

Using the Escape Character ( \ ) We can use the backslash ( \ ) escape character to prevent JavaScript from interpreting a quote as the end of the string. The syntax of \' will always be a single quote, and the syntax of \" will always be a double quote, without any fear of breaking the string.

What characters break XML?

The only illegal characters are & , < and > (as well as " or ' in attributes, depending on which character is used to delimit the attribute value: attr="must use &quot; here, ' is allowed" and attr='must use &apos; here, " is allowed' ). They're escaped using XML entities, in this case you want &amp; for & .


1 Answers

HTML encoding is simply replacing &, ", ', < and > chars with their entity equivalents. Order matters, if you don't replace the & chars first, you'll double encode some of the entities:

if (!String.prototype.encodeHTML) {   String.prototype.encodeHTML = function () {     return this.replace(/&/g, '&amp;')                .replace(/</g, '&lt;')                .replace(/>/g, '&gt;')                .replace(/"/g, '&quot;')                .replace(/'/g, '&apos;');   }; } 

As @Johan B.W. de Vries pointed out, this will have issues with the tag names, I would like to clarify that I made the assumption that this was being used for the value only

Conversely if you want to decode HTML entities1, make sure you decode &amp; to & after everything else so that you don't double decode any entities:

if (!String.prototype.decodeHTML) {   String.prototype.decodeHTML = function () {     return this.replace(/&apos;/g, "'")                .replace(/&quot;/g, '"')                .replace(/&gt;/g, '>')                .replace(/&lt;/g, '<')                .replace(/&amp;/g, '&');   }; } 

1 just the basics, not including &copy; to © or other such things


As far as libraries are concerned. Underscore.js (or Lodash if you prefer) provides an _.escape method to perform this functionality.

like image 60
zzzzBov Avatar answered Sep 29 '22 08:09

zzzzBov