How can I convert an array of bytes into a UTF-8 string? I need this because I am extracting from a binary format.
All PHP string functions work well with UTF-8 encoded strings as long as the strings use only 7-bit ASCII characters (because the encoding of the first 128 characters is identical in ASCII and UTF-8).
A ByteArray holds, well, an array of Bytes. 1 Byte contains 8 Bits, and each Bit is either 1 or 0 (binary). Think of a Byte as a string, 8 characters in length and each of these characters is either a 1 or 0. 000101010, 00000001, 11110000 etc … Each of these Bytes can be represented in Base10 by a number between 0-255.
The default source encoding used by PHP is ISO-8859-1 . Target encoding is done when PHP passes data to XML handler functions.
UTF-8 (UCS Transformation Format 8) is the World Wide Web's most common character encoding. Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character.
A string is nothing more than an array of bytes. So a UTF-8 string is the very same as an array of bytes, except that in addition you know what the array of bytes represent.
So your input array of bytes needs one more additional information as well: the character set (character encoding). If you know the input character set, you can convert the array of bytes to another array of bytes representing an UTF-8 string.
The PHP method for doing that is called mb_convert_encoding()
.
PHP itself does not know of character sets (character encodings). So a string really is nothing more than an array of bytes. The application has to know how to handle that.
So if you have an array of bytes and want to turn that into a PHP string in order to convert the character set using mb_convert_encoding()
, try the following:
$input = array(0x53, 0x68, 0x69);
$output = '';
for ($i = 0, $j = count($input); $i < $j; ++$i) {
$output .= chr($input[$i]);
}
$output_utf8 = mb_convert_encoding($output, 'utf-8', 'enter input encoding here');
(Instead of the single example above, have a look at more examples at https://stackoverflow.com/a/5473057/530502.)
$output_utf8
then will be a PHP string of the input array of bytes converted to UTF-8.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With