Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using the php 5.4's new constant ENT_DISALLOWED in htmlentities

There is a string I'm trying to output in an htmlencoded way, and the htmlentities() function always returns an empty string.

I know exactly why it does so. Well, I am not running PHP 5.4 I got the latest PHP 5.3 flavor installed.

The question is how I am gonna be able to htmlencode a string which has invalid code unit sequences.

According to the manual, ENT_SUBSTITUTE is the way to go. But this constant is not defined in PHP 5.3.X.

I did this:

if (!defined('ENT_SUBSTITUTE')) {
    define('ENT_SUBSTITUTE', 8);
}

still no luck. htmlentities is still returning empty string.

I wanted to try ENT_DISALLOWED instead, but I cannot find its corresponding long value for it.

So my question is two folded

  1. What's the constant value of PHP 5.4's ENT_DISALLOWED?

  2. How do I make sure that a string containing non UTF-8 characters (such as the smart quotes), can be cleared out of them? - Not just the smart quotes but anything that causes htmlentities() to return blank string.

like image 752
Average Joe Avatar asked Sep 18 '12 00:09

Average Joe


2 Answers

It is true that htmlentities() in PHP 5.3 does not have the ENT_SUBSTITUTE flag, however it has the (not really suggested) ENT_IGNORE flag. Be ware of the note and try to understand it before use:

Using this flag is discouraged as it » may have security implications.

It is far better that you understand why there is a problem with the input string in the first place. Most often users are only missing to specify the correct encoding.

E.g. first re-encode the string into UTF-8, then pass it to htmlspecialchars() or htmlentities(). Speaking of smart-quotes you are probably using a Windows-1252 encoded string. You won't even need to convert that one before use, you can just specify the charset properly (PHP 5.3):

htmlentities($string, ENT_QUOTES, $encoding = 'Windows-1252');

Naturally this only works if the input $string is encoded in Windows-1252 (CP1252). Find out the correct encoding first, then it's normally no problem. For non-supported encodings re-encode into a supported one first, for example with iconv or mb_string.

like image 104
hakre Avatar answered Sep 20 '22 13:09

hakre


As you say, these constants were added in 5.4.0. The thing is, the support is new to 5.4.0 as well. Meaning you can pass whatever values you want, older htmlentities will not understand it.

As it is most probably the case, php changelog is quite misleading.

like image 39
Mikulas Dite Avatar answered Sep 19 '22 13:09

Mikulas Dite