I'm currently tightening up security on my website and I'm trying to make sure every single value passed from PHP to HTML is encoded correctly.
Currently, assigning values to the template will encode it, however some parts of the website are old and do not use templates.
I changed the workings of the functions I use to output HTML to encode all the values. This worked great for covering all the old pages, however it now causes double encoding on template values.
I changed the encoding function I use to do:
$textToEncode = htmlspecialchars_decode($szText);
return htmlspecialchars($textToEncode, ENT_COMPAT, 'ISO-8859-1');
This has worked from what I can see. By decoding it first, it will always ensure it doesn't double encode and I can't think of any reason where decoding an unencoded string would cause problems. Is this an ok solution?
Using htmlspecialchars() function – The htmlspecialchars() function converts special characters to HTML entities. For a majority of web-apps, we can use this method and this is one of the most popular methods to prevent XSS. This process is also known as HTML Escaping.
Difference between htmlentities() and htmlspecialchars() function: The only difference between these function is that htmlspecialchars() function convert the special characters to HTML entities whereas htmlentities() function convert all applicable characters to HTML entities.
The first question is: When to use the htmlspecialchars function? You use htmlspecialchars EVERY time you output content within HTML, so it is interpreted as content and not HTML. If you allow content to be treated as HTML, you have just opened the door to bugs at a minimum, and total XSS hacks at worst.
The htmlspecialchars() function is used to converts special characters ( e.g. & (ampersand), " (double quote), ' (single quote), < (less than), > (greater than)) to HTML entities ( i.e. & (ampersand) becomes &, ' (single quote) becomes ', < (less than) becomes < (greater than) becomes > ).
If you look at the manual, you'll see that what you're looking for is the last argument of the function - $double_encode
= false, which is true by default:
string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = 'UTF-8' [, bool $double_encode = true ]]]
Thus:
htmlspecialchars($textToEncode, ENT_COMPAT, 'ISO-8859-1', false);
You're simply out of luck. You either know that a string is encoded or not. You cannot detect or guess. What if I mean to write "&" and a string in your database contains that value? That's the original, unencoded string. But it looks encoded.
You need to keep track of where and when and why you encode strings, you cannot figure it out reliably after the fact.
If one of your users wrote this in your hypothetical forum:
The HTML entity for "&" is "&".
Then your decoding and encoding, or "intelligent non-double encoding" that @Robert suggests, would turn this into:
The HTML entity for "&" is "&".
And all meaning of that post is lost.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With