Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

escaping inside html tag attribute value

Tags:

I am having trouble understanding how escaping works inside html tag attribute values that are javascript.

I was lead to believe that you should always escape & ' " < > . So for javascript as an attribute value I tried:

<a href="javascript:alert(&apos;Hello&apos;);"></a>

It doesn't work. However:

<a href="javascript:alert(&#39;Hello&#39;);"></a>

and

<a href="javascript:alert('Hello');"></a>

does work in all browsers!

Now I am totally confused. If all my attribute values are enclosed in double quotes, does this mean I do not have to escape single quotes? Or is apos and ascii 39 technically different characters? Such that javascript requires ascii 39, but not apos?

like image 665
Myforwik Avatar asked Feb 08 '12 04:02

Myforwik


People also ask

How do you escape attributes in HTML?

Escaping HTML Attributes¶ To escape data in the HTML Attribute, use Zend\Escaper\Escaper's escapeHtmlAttr method.

How do you escape HTML tags in HTML?

Escape characters will always begin with the ampersand symbol (&) and end with a semicolon symbol (;). The characters in between the ampersand and semicolon make up the specific code name or number for a particular character.

What does escaping HTML do?

Escaping in HTML means, that you are replacing some special characters with others. In HTML it means usally, you replace e. e.g < or > or " or & . These characters have special meanings in HTML. And the text will appear as hello, world.

How do you escape quotes in HTML?

Example# Quotes in HTML strings can also be represented using &apos; (or &#39; ) as a single quote and &quot; ( or &#34; ) as double quotes. Note: The use of &apos; and &quot; will not overwrite double quotes that browsers can automatically place on attribute quotes.


2 Answers

There are two types of “escapes” involved here, HTML and JavaScript. When interpreting an HTML document, the HTML escapes are parsed first.

As far as HTML is considered, the rules within an attribute value are the same as elsewhere plus one additional rule:

  • The less-than character < should be escaped. Usually &lt; is used for this. Technically, depending on HTML version, escaping is not always required, but it has always been good practice.
  • The ampersand & should be escaped. Usually &amp; is used for this. This, too, is not always obligatory, but it is simpler to do it always than to learn and remember when it is required.
  • The character that is used as delimiters around the attribute value must be escaped inside it. If you use the Ascii quotation mark " as delimiter, it is customary to escape its occurrences using &quot; whereas for the Ascii apostrophe, the entity reference &apos; is defined in some HTML versions only, so it it safest to use the numeric reference &#39; (or &#x27;).

You can escape > (or any other data character) if you like, but it is never needed.

On the JavaScript side, there are some escape mechanisms (with \) in string literals. But these are a different issue, and not relevant in your case.

In your example, on a browser that conforms to current specifications, the JavaScript interpreter sees exactly the same code alert('Hello');. The browser has “unescaped” &apos; or &#39; to '. I was somewhat surprised to here that &apos; is not universally supported these days, but it’s not an issue: there is seldom any need to escape the Ascii apostrophe in HTML (escaping is only needed within attribute values and only if you use the Ascii apostrophe as its delimiter), and when there is, you can use the &#39; reference.

like image 77
Jukka K. Korpela Avatar answered Oct 05 '22 08:10

Jukka K. Korpela


&apos; is not a valid HTML reference entity. You should escape using &#39;

like image 23
Myforwik Avatar answered Oct 04 '22 08:10

Myforwik