Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Ampersand should be escaped because of XSS injection

The five characters that OWASP recommend escape to prevent XSS injections are &, <, >, ", '.

Among them, I cannot understand why &(ampersand) should be escaped and how it can be used as a vector to inject script. Can somebody give an example that all the other four characters that are escaped but ampersand is not so there will be XSS injection vulnerability.

I have checked the other question but that answer really does not make things any clearer.

like image 891
Jinxin Ni Avatar asked Aug 29 '16 20:08

Jinxin Ni


People also ask

Does escaping prevent XSS?

Escaping is the primary means to avoid cross-site scripting attacks. When escaping, you are effectively telling the web browser that the data you are sending should be treated as data and should not be interpreted in any other way.

What Defence can be implemented to help stop XSS?

Cross-site scripting prevention can generally be achieved via two layers of defense: Encode data on output. Validate input on arrival.

Which encoding scheme is generally used to mitigate XSS attacks?

HTML Sanitization Output encoding here will prevent XSS, but it will break the intended functionality of the application. The styling will not be rendered. In these cases, HTML Sanitization should be used. HTML Sanitization will strip dangerous HTML from a variable and return a safe string of HTML.


1 Answers

The answer here addresses the issue only in a nested JavaScript context within an HTML attribute context, whereas your question asks specifically about pure HTML context escaping.

In that question, the escaping should be as per the OWASP recommendation for JavaScript:

Except for alphanumeric characters, escape all characters with the \uXXXX unicode escaping format (X = Integer).

Which will already handle & because it is not alphanumeric.

To answer you question, from a practical point of view, why wouldn't you escape ampersand?

The HTML representation of & is &amp;, so it makes a lot of sense to do that. If you didn't, anytime a user entered &amp, &lt, or &gt into your application, your application would render &, <, or > instead of &amp, &lt or &gt.

An edge case? Definitely. A security concern? It shouldn't be.

From the HTML5 syntax Character references section:

Character references must start with a U+0026 AMPERSAND character (&). Following this, there are three possible kinds of character references:

  • Named character references
  • Decimal numeric character reference
  • Hexadecimal numeric character reference

When an & is encountered:

Switch to the data state.

Attempt to consume a character reference, with no additional allowed character.

If nothing is returned, emit a U+0026 AMPERSAND character (&) token.

Otherwise, emit the character tokens that were returned.

Therefore, anything after the & will cause either & to be output, or the character represented. As the following characters have to be alphanumeric or else they won't be consumed, there is no chance of an escape character (e.g. ', ", >, <) being consumed and ignored, therefore there is little security risk of an attacker changing the parsing context. However, you never know if there is a browser bug that doesn't quite follow the standard properly, therefore I would always escape &. Internet Explorer had an issue where you could specify <% and it would be interpreted as < allowing the .NET Request Validation from being bypassed for XSS attack vectors. Always better to be safe than sorry.

like image 132
SilverlightFox Avatar answered Nov 15 '22 10:11

SilverlightFox