Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mb_detect_encoding doesn't properly working with Windows-1250 (CP1250)

I have problem with detecting CP1250 in mb_detect_encoding(), in my case I want detect 3 encodings:

mb_detect_encoding($string, 'UTF-8,ISO-8859-2,Windows-1250')

But Windows isn't in supported encodings, any solution?

like image 299
Piotr Olaszewski Avatar asked Oct 17 '25 19:10

Piotr Olaszewski


1 Answers

mb_detect_encoding always "detects" single-byte encodings. You can read about this in the documentation for mb_detect_order:

mbstring currently implements the following encoding detection filters. If there is an invalid byte sequence for the following encodings, encoding detection will fail.

UTF-8, UTF-7, ASCII, EUC-JP,SJIS, eucJP-win, SJIS-win, JIS, ISO-2022-JP

For ISO-8859-X, mbstring always detects as ISO-8859-X.

For UTF-16, UTF-32, UCS2 and UCS4, encoding detection will fail always.

Conclusions:

  1. It's meaningless to ask for detection of ISO-8859-2; it will always tell you "yes, that's it" (unless of course it detects UTF-8 first).
  2. Windows-1250 is not supported, but even if it were it would work exactly like ISO-8859-2.

In general, it is impossible to detect single-byte encodings with accuracy. If you find yourself needing to do that in PHP you will need to do it manually; don't expect very good results.

like image 78
Jon Avatar answered Oct 19 '25 10:10

Jon