Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there other sequences browsers interpret as HTML special characters?

In HTML, there are several special characters < > & ' " which have significance to the DOM parser. These are the characters the popular functions such as PHP's htmlspecialchars convert to HTML entities so they don't accidentally trigger something when parsed.

The translations performed are:

  • '&' (ampersand) becomes &amp;
  • " (double quote) becomes &quot; when ENT_NOQUOTES is not set.
  • ' (single quote) becomes &#039; only when ENT_QUOTES is set.
  • '<' (less than) becomes &lt;
  • '>' (greater than) becomes &gt;

However, I remember that in older browsers like IE6, there were also other byte sequences that caused the browser's DOM parser to interpret content as HTML.

Is this still a problem today? If you filter these 5 alone is that enough to prevent XSS?

For example, here are all the known combinations of the character "<" in HTML and JavaScript (in UTF-8).

<
%3C
&lt
&lt;
&LT
&LT;
&#60
&#060
&#0060
&#00060
&#000060
&#0000060
&#60;
&#060;
&#0060;
&#00060;
&#000060;
&#0000060;
&#x3c
&#x03c
&#x003c
&#x0003c
&#x00003c
&#x000003c
&#x3c;
&#x03c;
&#x003c;
&#x0003c;
&#x00003c;
&#x000003c;
&#X3c
&#X03c
&#X003c
&#X0003c
&#X00003c
&#X000003c
&#X3c;
&#X03c;
&#X003c;
&#X0003c;
&#X00003c;
&#X000003c;
&#x3C
&#x03C
&#x003C
&#x0003C
&#x00003C
&#x000003C
&#x3C;
&#x03C;
&#x003C;
&#x0003C;
&#x00003C;
&#x000003C;
&#X3C
&#X03C
&#X003C
&#X0003C
&#X00003C
&#X000003C
&#X3C;
&#X03C;
&#X003C;
&#X0003C;
&#X00003C;
&#X000003C;
\x3c
\x3C
\u003c
\u003C
like image 668
Xeoncross Avatar asked Dec 24 '11 19:12

Xeoncross


People also ask

What are HTML special characters?

HTML special characters are assigned an entity name and an entity number, both of which can be used to render the character in an HTML document. These codes and names have a specific format, which is generally represented as &#xxxx; for numbers and &xxxx; for names, where xxxx is either a name or a number.

What is HTML &GT?

&gt; and &lt; is a character entity reference for the > and < character in HTML. It is not possible to use the less than (<) or greater than (>) signs in your file, because the browser will mix them with tags. for these difficulties you can use entity names( &gt; ) and entity numbers( &#60; ).


3 Answers

No. I actually looked into this when I was researching using CSS and attributes to automatically assign styles based on content (my question), and the short answer is no. Modern browsers do not allow 'byte sequences' to be used as HTML. I use 'byte sequences' lightly because the most at risk code does not use byte encoded values.

The examples listed on the XSS site are about using attributes and having the javascript interpreted as a string that would need execution. But also listed is things like &{alert('XSS')} which runs the code within the brackets, and that code does not work in modern browsers.

But to answer your second question, no, filtering those 5 is not enough to prevent an XSS attack. Throw your code through the PHP HTML special characters code always but there a hundreds of byte codes that can be used and you won't really be able to guarantee anything. Sending it through a PHP filter (especially htmlentities()) will give you the exact text entered when you output it to HTML (IE &laquo; instead of «). That said, in most cases, depending your usage, using htmlspecialchars is enough to cover most attacks. Depends on how you will be using the input, but for the most part it will be safe.

XSS is a tricky thing to account for. A general good rule is always filter everything that a user will enter. And use white-listing instead of black-listing. What your talking about here would be black-listing these values, when it is always safer to assume your users are malicious and only allow certain things.

like image 113
LoveAndCoding Avatar answered Oct 19 '22 15:10

LoveAndCoding


Here is an example: <button onclick="confirm('Are you sure you want to delete &#39;);alert(&#39;xss')> Here the attackers input is what comes after "delete" and before ')>

This escaping will not work in this case, because we escaped for the wrong context.

In short xss prevention means escaping for the given context. In the above example we are in a javascript context within a HTML attribute context. See the OWASP XSS prevention cheat sheet.

like image 25
Erlend Avatar answered Oct 19 '22 15:10

Erlend


It suffices to escape text in HTML, but there are contexts in HTML where even text is dangerous:

  • don't allow users to create arbitrary URLs (in <a>, <img>, etc.), as they can insert javascript: or many variations of it. Whitelist only ^https?://.

  • HTML-escaping doesn't suffice in <script> (it use entity-escaping anyway) or in attributes that execute a script (onclick, etc). For those you need json_encode().

like image 40
Kornel Avatar answered Oct 19 '22 14:10

Kornel