I’m using JavaScript to pull a value out from a hidden field and display it in a textbox. The value in the hidden field is encoded.
For example,
<input id='hiddenId' type='hidden' value='chalk & cheese' />
gets pulled into
<input type='text' value='chalk & cheese' />
via some jQuery to get the value from the hidden field (it’s at this point that I lose the encoding):
$('#hiddenId').attr('value')
The problem is that when I read chalk & cheese
from the hidden field, JavaScript seems to lose the encoding. I do not want the value to be chalk & cheese
. I want the literal amp;
to be retained.
Is there a JavaScript library or a jQuery method that will HTML-encode a string?
EDIT: This answer was posted a long ago, and the htmlDecode
function introduced a XSS vulnerability. It has been modified changing the temporary element from a div
to a textarea
reducing the XSS chance. But nowadays, I would encourage you to use the DOMParser API as suggested in other anwswer.
I use these functions:
function htmlEncode(value){ // Create a in-memory element, set its inner text (which is automatically encoded) // Then grab the encoded contents back out. The element never exists on the DOM. return $('<textarea/>').text(value).html(); } function htmlDecode(value){ return $('<textarea/>').html(value).text(); }
Basically a textarea element is created in memory, but it is never appended to the document.
On the htmlEncode
function I set the innerText
of the element, and retrieve the encoded innerHTML
; on the htmlDecode
function I set the innerHTML
value of the element and the innerText
is retrieved.
Check a running example here.
The jQuery trick doesn't encode quote marks and in IE it will strip your whitespace.
Based on the escape templatetag in Django, which I guess is heavily used/tested already, I made this function which does what's needed.
It's arguably simpler (and possibly faster) than any of the workarounds for the whitespace-stripping issue - and it encodes quote marks, which is essential if you're going to use the result inside an attribute value for example.
function htmlEscape(str) { return str .replace(/&/g, '&') .replace(/"/g, '"') .replace(/'/g, ''') .replace(/</g, '<') .replace(/>/g, '>'); } // I needed the opposite function today, so adding here too: function htmlUnescape(str){ return str .replace(/"/g, '"') .replace(/'/g, "'") .replace(/</g, '<') .replace(/>/g, '>') .replace(/&/g, '&'); }
Update 2013-06-17:
In the search for the fastest escaping I have found this implementation of a replaceAll
method:
http://dumpsite.com/forum/index.php?topic=4.msg29#msg29
(also referenced here: Fastest method to replace all instances of a character in a string)
Some performance results here:
http://jsperf.com/htmlencoderegex/25
It gives identical result string to the builtin replace
chains above. I'd be very happy if someone could explain why it's faster!?
Update 2015-03-04:
I just noticed that AngularJS are using exactly the method above:
https://github.com/angular/angular.js/blob/v1.3.14/src/ngSanitize/sanitize.js#L435
They add a couple of refinements - they appear to be handling an obscure Unicode issue as well as converting all non-alphanumeric characters to entities. I was under the impression the latter was not necessary as long as you have an UTF8 charset specified for your document.
I will note that (4 years later) Django still does not do either of these things, so I'm not sure how important they are:
https://github.com/django/django/blob/1.8b1/django/utils/html.py#L44
Update 2016-04-06:
You may also wish to escape forward-slash /
. This is not required for correct HTML encoding, however it is recommended by OWASP as an anti-XSS safety measure. (thanks to @JNF for suggesting this in comments)
.replace(/\//g, '/');
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With