Possible Duplicate:
Detect file encoding in PHP
How can I figure out with PHP what file encoding a file has?
Detecting the encoding is really hard for all 8 bit character sets but utf-8 (because not every 8 bit byte sequence is valid utf-8) and usually requires semantic knowledge of the text for which the encoding is to be detected.
Think of it: Any particular plain text information is just a bunch of bytes with no encoding information associated. If you look at any particular byte, it could mean anything, so to have a chance at detecting the encoding, you would have to look at that byte in context of other bytes and try some heuristics based on possible language combination.
For 8bit character sets you can never be sure though.
A demonstration of heuristics going wrong is here for example:
http://www.hoax-slayer.com/bush-hid-the-facts-notepad.html
Some 16bit sets, you have a chance at detecting because they might include a byte order mark or have every second byte set to 0.
If you just want to detect UTF-8, you can either use mb_detect_encoding as already explained, or you can use this handy little function:
function isUTF8($string){
return preg_match('%(?:
[\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte
|\xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs
|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
|\xED[\x80-\x9F][\x80-\xBF] # excluding surrogates
|\xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3
|[\xF1-\xF3][\x80-\xBF]{3} # planes 4-15
|\xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16
)+%xs', $string);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With