Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the browser automatically unescape html tag attribute values?

Below I have an HTML tag, and use JavaScript to extract the value of the widget attribute. This code will alert <test> instead of &lt;test&gt;, so the browser automatically unescapes attribute values:

alert(document.getElementById("hau").attributes[1].value)
<div id="hau" widget="&lt;test&gt;"></div>

My questions are:

  1. Can this behavior be prevented in any way, besides doing a double escape of the attribute contents? (It would look like this: &amp;lt;test&amp;gt;)
  2. Does anyone know why the browser behaves like this? Is there any place in the HTML specs that this behavior is mentioned explicitly?
like image 764
pax162 Avatar asked Sep 07 '16 06:09

pax162


2 Answers

1) It can be done without doing a double escape

Looks like yours is closer to htmlEncode(). If you don't mind using jQuery

alert(htmlEncode($('#hau').attr('widget')))

function htmlEncode(value){
  //create a in-memory div, set it's inner text(which jQuery automatically encodes)
  //then grab the encoded contents back out.  The div never exists on the page.
  return $('<div/>').text(value).html();
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="hau" widget="&lt;test&gt;"></div>

If you're interested in a pure vanilla js solution

alert(htmlEncode(document.getElementById("hau").attributes[1].value))
function htmlEncode( html ) {
    return document.createElement( 'a' ).appendChild( 
        document.createTextNode( html ) ).parentNode.innerHTML;
};
<div id="hau" widget="&lt;test&gt;"></div>

2) Why does the browser behave like this?

Only because of this behaviour, we are able to do a few specific things, such as including quotes inside of a pre-filled input field as shown below, which would not have been possible if the only way to insert " is by adding itself which again would require escaping with another char like \

<input type='text' value="&quot;You &apos;should&apos; see the double quotes here&quot;" />
like image 122
Saravanabalagi Ramachandran Avatar answered Nov 07 '22 06:11

Saravanabalagi Ramachandran


The browser unescapes the attribute value as soon as it parses the document (mentioned here). One of the reasons might be that it would otherwise be impossible to include, for example, double quotes in your attribute value (well, technically it would if you put the value in single quotes instead, but then you wouldn't be able to include single quotes in the value).

That said, the behavior cannot be prevented, although if you really must use the value with the HTML entities being part of it, you could simply turn your special characters back into the codes (I recommend Underscore's escape for such task).

like image 1
lucasnadalutti Avatar answered Nov 07 '22 08:11

lucasnadalutti