Detecting the right character encoding in PHP?

Question

I'm trying to detect the character encoding of a string but I can't get the right result.
For example:

$str = "&euro; &sbquo; &fnof; &bdquo; &hellip;" ;
$str = mb_convert_encoding($str, 'Windows-1252' ,'HTML-ENTITIES') ;
// Now $str should be a Windows-1252-encoded string.
// Let's detect its encoding:
echo mb_detect_encoding($str,'Windows-1252, ISO-8859-1, UTF-8') ;

That code outputs ISO-8859-1 but it should be Windows-1252.

What's wrong with this?

EDIT:
Updated example, in response to @raina77ow.

$str = "&euro;&sbquo;&fnof;&bdquo;&hellip;" ; // no white-spaces
$str = mb_convert_encoding($str, 'Windows-1252' ,'HTML-ENTITIES') ;
$str = "Hello $str" ; // let's add some ascii characters
echo mb_detect_encoding($str,'Windows-1252, ISO-8859-1, UTF-8') ;

I get the wrong result again.

scy · Accepted Answer

The problem with Windows-1252 in PHP is that it will almost never be detected, because as soon as your text contains any characters outside of 0x80 to 0x9f, it will not be detected as Windows-1252.

This means that if your string contains a normal ASCII letter like "A", or even a space character, PHP will say that this is not valid Windows-1252 and, in your case, fall back to the next possible encoding, which is ISO 8859-1. This is a PHP bug, see https://bugs.php.net/bug.php?id=64667.

Detecting the right character encoding in PHP?

Tags:

php

character-encoding

detection

multibyte

GetFree

1 Answers

scy

Recent Activity

Donate For Us

Detecting the right character encoding in PHP?

Tags:

php

character-encoding

detection

multibyte

GetFree

1 Answers

scy

Related questions

Recent Activity

Donate For Us