Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoiding accidental double htmlspecialchars encoding?

Tags:

html

php

encoding

I'm currently tightening up security on my website and I'm trying to make sure every single value passed from PHP to HTML is encoded correctly.

Currently, assigning values to the template will encode it, however some parts of the website are old and do not use templates.

I changed the workings of the functions I use to output HTML to encode all the values. This worked great for covering all the old pages, however it now causes double encoding on template values.

I changed the encoding function I use to do:

$textToEncode = htmlspecialchars_decode($szText);
return htmlspecialchars($textToEncode, ENT_COMPAT, 'ISO-8859-1');

This has worked from what I can see. By decoding it first, it will always ensure it doesn't double encode and I can't think of any reason where decoding an unencoded string would cause problems. Is this an ok solution?

like image 696
MatthewMcGovern Avatar asked May 20 '13 09:05

MatthewMcGovern


People also ask

Does Htmlspecialchars prevent XSS?

Using htmlspecialchars() function – The htmlspecialchars() function converts special characters to HTML entities. For a majority of web-apps, we can use this method and this is one of the most popular methods to prevent XSS. This process is also known as HTML Escaping.

What's the difference between HTML entities () and htmlspecialchars ()?

Difference between htmlentities() and htmlspecialchars() function: The only difference between these function is that htmlspecialchars() function convert the special characters to HTML entities whereas htmlentities() function convert all applicable characters to HTML entities.

Should I use Htmlspecialchars?

The first question is: When to use the htmlspecialchars function? You use htmlspecialchars EVERY time you output content within HTML, so it is interpreted as content and not HTML. If you allow content to be treated as HTML, you have just opened the door to bugs at a minimum, and total XSS hacks at worst.

Why is the Htmlspecialchars () function used?

The htmlspecialchars() function is used to converts special characters ( e.g. & (ampersand), " (double quote), ' (single quote), < (less than), > (greater than)) to HTML entities ( i.e. & (ampersand) becomes &amp, ' (single quote) becomes &#039, < (less than) becomes &lt; (greater than) becomes &gt; ).


2 Answers

If you look at the manual, you'll see that what you're looking for is the last argument of the function - $double_encode = false, which is true by default:

string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = 'UTF-8' [, bool $double_encode = true ]]] 

Thus:

htmlspecialchars($textToEncode, ENT_COMPAT, 'ISO-8859-1', false);
like image 58
silkfire Avatar answered Nov 08 '22 17:11

silkfire


You're simply out of luck. You either know that a string is encoded or not. You cannot detect or guess. What if I mean to write "&amp;" and a string in your database contains that value? That's the original, unencoded string. But it looks encoded.

You need to keep track of where and when and why you encode strings, you cannot figure it out reliably after the fact.

If one of your users wrote this in your hypothetical forum:

The HTML entity for "&" is "&amp;".

Then your decoding and encoding, or "intelligent non-double encoding" that @Robert suggests, would turn this into:

The HTML entity for "&" is "&".

And all meaning of that post is lost.

like image 34
deceze Avatar answered Nov 08 '22 16:11

deceze