Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HTML-encoding lost when attribute read from input field

I’m using JavaScript to pull a value out from a hidden field and display it in a textbox. The value in the hidden field is encoded.

For example,

<input id='hiddenId' type='hidden' value='chalk &amp; cheese' /> 

gets pulled into

<input type='text' value='chalk &amp; cheese' /> 

via some jQuery to get the value from the hidden field (it’s at this point that I lose the encoding):

$('#hiddenId').attr('value') 

The problem is that when I read chalk &amp; cheese from the hidden field, JavaScript seems to lose the encoding. I do not want the value to be chalk & cheese. I want the literal amp; to be retained.

Is there a JavaScript library or a jQuery method that will HTML-encode a string?

like image 259
AJM Avatar asked Aug 02 '09 21:08

AJM


2 Answers

EDIT: This answer was posted a long ago, and the htmlDecode function introduced a XSS vulnerability. It has been modified changing the temporary element from a div to a textarea reducing the XSS chance. But nowadays, I would encourage you to use the DOMParser API as suggested in other anwswer.


I use these functions:

function htmlEncode(value){   // Create a in-memory element, set its inner text (which is automatically encoded)   // Then grab the encoded contents back out. The element never exists on the DOM.   return $('<textarea/>').text(value).html(); }  function htmlDecode(value){   return $('<textarea/>').html(value).text(); } 

Basically a textarea element is created in memory, but it is never appended to the document.

On the htmlEncode function I set the innerText of the element, and retrieve the encoded innerHTML; on the htmlDecode function I set the innerHTML value of the element and the innerText is retrieved.

Check a running example here.

like image 163
Christian C. Salvadó Avatar answered Sep 28 '22 09:09

Christian C. Salvadó


The jQuery trick doesn't encode quote marks and in IE it will strip your whitespace.

Based on the escape templatetag in Django, which I guess is heavily used/tested already, I made this function which does what's needed.

It's arguably simpler (and possibly faster) than any of the workarounds for the whitespace-stripping issue - and it encodes quote marks, which is essential if you're going to use the result inside an attribute value for example.

function htmlEscape(str) {     return str         .replace(/&/g, '&amp;')         .replace(/"/g, '&quot;')         .replace(/'/g, '&#39;')         .replace(/</g, '&lt;')         .replace(/>/g, '&gt;'); }  // I needed the opposite function today, so adding here too: function htmlUnescape(str){     return str         .replace(/&quot;/g, '"')         .replace(/&#39;/g, "'")         .replace(/&lt;/g, '<')         .replace(/&gt;/g, '>')         .replace(/&amp;/g, '&'); } 

Update 2013-06-17:
In the search for the fastest escaping I have found this implementation of a replaceAll method:
http://dumpsite.com/forum/index.php?topic=4.msg29#msg29
(also referenced here: Fastest method to replace all instances of a character in a string)
Some performance results here:
http://jsperf.com/htmlencoderegex/25

It gives identical result string to the builtin replace chains above. I'd be very happy if someone could explain why it's faster!?

Update 2015-03-04:
I just noticed that AngularJS are using exactly the method above:
https://github.com/angular/angular.js/blob/v1.3.14/src/ngSanitize/sanitize.js#L435

They add a couple of refinements - they appear to be handling an obscure Unicode issue as well as converting all non-alphanumeric characters to entities. I was under the impression the latter was not necessary as long as you have an UTF8 charset specified for your document.

I will note that (4 years later) Django still does not do either of these things, so I'm not sure how important they are:
https://github.com/django/django/blob/1.8b1/django/utils/html.py#L44

Update 2016-04-06:
You may also wish to escape forward-slash /. This is not required for correct HTML encoding, however it is recommended by OWASP as an anti-XSS safety measure. (thanks to @JNF for suggesting this in comments)

        .replace(/\//g, '&#x2F;'); 
like image 45
Anentropic Avatar answered Sep 28 '22 08:09

Anentropic