I’m using JavaScript to pull a value out from a hidden field and display it in a textbox. The value in the hidden field is encoded.
For example,
<input id='hiddenId' type='hidden' value='chalk & cheese' />   gets pulled into
<input type='text' value='chalk & cheese' />   via some jQuery to get the value from the hidden field (it’s at this point that I lose the encoding):
$('#hiddenId').attr('value')   The problem is that when I read chalk & cheese from the hidden field, JavaScript seems to lose the encoding. I do not want the value to be chalk & cheese. I want the literal amp; to be retained.
Is there a JavaScript library or a jQuery method that will HTML-encode a string?
EDIT: This answer was posted a long ago, and the htmlDecode function introduced a XSS vulnerability. It has been modified changing the temporary element from a div to a textarea reducing the XSS chance. But nowadays, I would encourage you to use the DOMParser API as suggested in other anwswer.
I use these functions:
function htmlEncode(value){   // Create a in-memory element, set its inner text (which is automatically encoded)   // Then grab the encoded contents back out. The element never exists on the DOM.   return $('<textarea/>').text(value).html(); }  function htmlDecode(value){   return $('<textarea/>').html(value).text(); }   Basically a textarea element is created in memory, but it is never appended to the document.
On the htmlEncode function I set the innerText of the element, and retrieve the encoded innerHTML; on the htmlDecode function I set the innerHTML value of the element and the innerText is retrieved.
Check a running example here.
The jQuery trick doesn't encode quote marks and in IE it will strip your whitespace.
Based on the escape templatetag in Django, which I guess is heavily used/tested already, I made this function which does what's needed.
It's arguably simpler (and possibly faster) than any of the workarounds for the whitespace-stripping issue - and it encodes quote marks, which is essential if you're going to use the result inside an attribute value for example.
function htmlEscape(str) {     return str         .replace(/&/g, '&')         .replace(/"/g, '"')         .replace(/'/g, ''')         .replace(/</g, '<')         .replace(/>/g, '>'); }  // I needed the opposite function today, so adding here too: function htmlUnescape(str){     return str         .replace(/"/g, '"')         .replace(/'/g, "'")         .replace(/</g, '<')         .replace(/>/g, '>')         .replace(/&/g, '&'); }   Update 2013-06-17:
 In the search for the fastest escaping I have found this implementation of a replaceAll method:
http://dumpsite.com/forum/index.php?topic=4.msg29#msg29
 (also referenced here: Fastest method to replace all instances of a character in a string)
 Some performance results here:
http://jsperf.com/htmlencoderegex/25
It gives identical result string to the builtin replace chains above. I'd be very happy if someone could explain why it's faster!?
Update 2015-03-04:
 I just noticed that AngularJS are using exactly the method above:
https://github.com/angular/angular.js/blob/v1.3.14/src/ngSanitize/sanitize.js#L435
They add a couple of refinements - they appear to be handling an obscure Unicode issue as well as converting all non-alphanumeric characters to entities. I was under the impression the latter was not necessary as long as you have an UTF8 charset specified for your document.
I will note that (4 years later) Django still does not do either of these things, so I'm not sure how important they are:
https://github.com/django/django/blob/1.8b1/django/utils/html.py#L44
Update 2016-04-06:
 You may also wish to escape forward-slash /. This is not required for correct HTML encoding, however it is recommended by OWASP as an anti-XSS safety measure. (thanks to @JNF for suggesting this in comments)
        .replace(/\//g, '/'); 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With