While trying to run a string through PHP's htmlentities function, I have some cases where I get a 'Invalid Multibyte Sequence' error. Is there a way to clean the string prior to calling the function to prevent this error from occuring?
As of PHP 5.4 you should use something along the following to properly escape output:
$escapedString = htmlspecialchars($string, ENT_QUOTES | ENT_SUBSTITUTE | ENT_DISALLOWED | ENT_HTML5, $stringEncoding);
ENT_SUBSTITUTE
replaces invalid code unit sequences by � (instead of returning an empty string).
ENT_DISALLOWED
replaces code points that are invalid in the specified doctype with �.
ENT_HTML5
specifies the used doctype. Depending on what you are using you may choose ENT_HTML401
, ENT_XHTML
or ENT_XML1
.
Using those options you make sure that the result is always valid in the given doctype, regardless of the kind of abominated input you get.
Also, don't forget to specify the $stringEncoding
. Relying on the default is a bad idea as it depends on ini
settings and may (and did) change between versions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With