Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Array of bytes to UTF-8 string in PHP? [closed]

Tags:

php

utf-8

How can I convert an array of bytes into a UTF-8 string? I need this because I am extracting from a binary format.

like image 542
HelloWorld Avatar asked Sep 02 '12 18:09

HelloWorld


People also ask

Are PHP strings utf8?

All PHP string functions work well with UTF-8 encoded strings as long as the strings use only 7-bit ASCII characters (because the encoding of the first 128 characters is identical in ASCII and UTF-8).

What is byte array in PHP?

A ByteArray holds, well, an array of Bytes. 1 Byte contains 8 Bits, and each Bit is either 1 or 0 (binary). Think of a Byte as a string, 8 characters in length and each of these characters is either a 1 or 0. 000101010, 00000001, 11110000 etc … Each of these Bytes can be represented in Base10 by a number between 0-255.

What string encoding does PHP use?

The default source encoding used by PHP is ISO-8859-1 . Target encoding is done when PHP passes data to XML handler functions.

What UTF-8 means?

UTF-8 (UCS Transformation Format 8) is the World Wide Web's most common character encoding. Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character.


1 Answers

A string is nothing more than an array of bytes. So a UTF-8 string is the very same as an array of bytes, except that in addition you know what the array of bytes represent.

So your input array of bytes needs one more additional information as well: the character set (character encoding). If you know the input character set, you can convert the array of bytes to another array of bytes representing an UTF-8 string.

The PHP method for doing that is called mb_convert_encoding().

PHP itself does not know of character sets (character encodings). So a string really is nothing more than an array of bytes. The application has to know how to handle that.

So if you have an array of bytes and want to turn that into a PHP string in order to convert the character set using mb_convert_encoding(), try the following:

$input = array(0x53, 0x68, 0x69);
$output = '';
for ($i = 0, $j = count($input); $i < $j; ++$i) {
    $output .= chr($input[$i]);
}
$output_utf8 = mb_convert_encoding($output, 'utf-8', 'enter input encoding here');

(Instead of the single example above, have a look at more examples at https://stackoverflow.com/a/5473057/530502.)

$output_utf8 then will be a PHP string of the input array of bytes converted to UTF-8.

like image 199
Shi Avatar answered Sep 19 '22 17:09

Shi