Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When used correctly, is htmlspecialchars sufficient for protection against all XSS?

Tags:

If the following statements are true,

  • All documents are served with the HTTP header Content-Type: text/html; charset=UTF-8.
  • All HTML attributes are enclosed in either single or double quotes.
  • There are no <script> tags in the document.

are there any cases where htmlspecialchars($input, ENT_QUOTES, 'UTF-8') (converting &, ", ', <, > to the corresponding named HTML entities) is not enough to protect against cross-site scripting when generating HTML on a web server?

like image 348
Alf Eaton Avatar asked Oct 25 '13 07:10

Alf Eaton


People also ask

Does Htmlspecialchars prevent XSS?

Using htmlspecialchars() function – The htmlspecialchars() function converts special characters to HTML entities. For a majority of web-apps, we can use this method and this is one of the most popular methods to prevent XSS. This process is also known as HTML Escaping.

When should you use the Htmlspecialchars function?

The htmlspecialchars() function is used to converts special characters ( e.g. & (ampersand), " (double quote), ' (single quote), < (less than), > (greater than)) to HTML entities ( i.e. & (ampersand) becomes &amp, ' (single quote) becomes &#039, < (less than) becomes &lt; (greater than) becomes &gt; ).

Is Htmlentities enough to prevent XSS?

In answer to your question, you should use htmlentities() when outputting any content that could contain user input or special characters. Show activity on this post. htmlspecialchars() is more than enough. htmlentities is for different use, not preventing XSS.

What encoding should be used to protect from XSS?

Cross site scripting, or XSS, is a form of attack on a web application which involves executing code on a user's browser. Output encoding is a defense against XSS attacks.


2 Answers

htmlspecialchars() is enough to prevent document-creation-time HTML injection with the limitations you state (ie no injection into tag content/unquoted attribute).

However there are other kinds of injection that can lead to XSS and:

There are no <script> tags in the document.

this condition doesn't cover all cases of JS injection. You might for example have an event handler attribute (requires JS-escaping inside HTML-escaping):

<div onmouseover="alert('<?php echo htmlspecialchars($xss) ?>')"> // bad!

or, even worse, a javascript: link (requires JS-escaping inside URL-escaping inside HTML-escaping):

<a href="javascript:alert('<?php echo htmlspecialchars($xss) ?>')"> // bad!

It is usually best to avoid these constructs anyway, but especially when templating. Writing <?php echo htmlspecialchars(urlencode(json_encode($something))) ?> is quite tedious.

And... injection issues can happen on the client-side as well (DOM XSS); htmlspecialchars() won't protect you against a piece of JavaScript writing to innerHTML (commonly .html() in poor jQuery scripts) without explicit escaping.

And... XSS has a wider range of causes than just injections. Other common causes are:

  • allowing the user to create links, without checking for known-good URL schemes (javascript: is the most well-known harmful scheme but there are more)

  • deliberately allowing the user to create markup, either directly or through light-markup schemes (like bbcode which is invariably exploitable)

  • allowing the user to upload files (which can through various means be reinterpreted as HTML or XML)

like image 107
bobince Avatar answered Sep 28 '22 12:09

bobince


Assuming you are not using older PHP versions (5.2 or so), the htmlspecialchars is "safe" (and off course taking the backend code into consideration as @Royal Bg mentions)

In older PHP versions malformed UTF-8 characters made this function vulnerable

My 2 cents: just always sanitize/check your inputs by telling what is allowed, instead of just escaping everything/encoding everything

i.e. if someone must enter a telephone number, i can imagine the following characters are allowed: 0123456789()+-. and a space, but all others are just ignored / stripped out

Same would apply to addresses etc. someone specifying UTF-8 characters for dots/blocks/hearts etc. in their address must be mentally ill...

like image 34
Ronald Swets Avatar answered Sep 28 '22 11:09

Ronald Swets