Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

htmlentities 'Invalid Multibyte Sequence' error

Tags:

php

While trying to run a string through PHP's htmlentities function, I have some cases where I get a 'Invalid Multibyte Sequence' error. Is there a way to clean the string prior to calling the function to prevent this error from occuring?

like image 604
GSto Avatar asked Feb 24 '10 15:02

GSto


1 Answers

As of PHP 5.4 you should use something along the following to properly escape output:

$escapedString = htmlspecialchars($string, ENT_QUOTES | ENT_SUBSTITUTE | ENT_DISALLOWED | ENT_HTML5, $stringEncoding);

ENT_SUBSTITUTE replaces invalid code unit sequences by � (instead of returning an empty string).

ENT_DISALLOWED replaces code points that are invalid in the specified doctype with �.

ENT_HTML5 specifies the used doctype. Depending on what you are using you may choose ENT_HTML401, ENT_XHTML or ENT_XML1.

Using those options you make sure that the result is always valid in the given doctype, regardless of the kind of abominated input you get.

Also, don't forget to specify the $stringEncoding. Relying on the default is a bad idea as it depends on ini settings and may (and did) change between versions.

like image 111
NikiC Avatar answered Oct 21 '22 03:10

NikiC