I'm trying to understand the logic of the two functions mb_detect_encoding
and mb_check_encoding
, but the documentation is poor. Starting with a very simple test string
$string = "\x65\x92";
Which is lowercase 'a' followed by a curly quote mark when using Windows-1252 encoding.
I get the following results:
mb_detect_encoding($string,"Windows-1252"); // false
mb_check_encoding($string,"Windows-1252"); // true
mb_detect_encoding($string,"ISO-8859-1"); // ISO-8859-1
mb_check_encoding($string,"ISO-8859-1"); // true
mb_detect_encoding($string,"UTF-8",true); // false
mb_detect_encoding($string,"UTF-8"); // UTF-8
mb_check_encoding($string,"UTF-8"); // false
I don't understand why mb_detect_encoding
gives "ISO-8859-1" for the string but not "Windows-1252", when, according to https://en.wikipedia.org/wiki/ISO/IEC_8859-1 and https://en.wikipedia.org/wiki/Windows-1252, the byte x92
is defined in the Windows-1252 character encoding but not in ISO-8859-1.
Secondly, I don't understand how mb_detect_encoding
can return false
, but mb_check_encoding
can return true
for the same string and same character encoding.
Finally, I don't understand why the string can ever be detected as UTF-8, strict mode or not. The byte x92
is a continuation byte in UTF-8, but in this string, it's following a valid character byte, not a leading byte for a sequence.
Your examples do a good job of showing why mb_detect_encoding
should be used sparingly, as it is not intuitive and sometimes logically wrong. If it must be used, always pass in strict = true
as the third parameter (so non-UTF8 strings don't get reported as UTF-8.
It's a bit more reliable to run mb_check_encoding
over an array of desired encodings, in order of likelihood/priority. For example:
$encodings = [
'UTF-8',
'Windows-1252',
'SJIS',
'ISO-8859-1',
];
$encoding = 'UTF-8';
$string = 'foo';
foreach ($encodings as $encoding) {
if (mb_check_encoding($string, $encoding)) {
// We'll assume encoding is $encoding since it's valid
break;
}
}
The ordering depends on your priorities though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With