Given a Unicode decimal or hex number for a character that's wanting to be output from a CLI PHP script, how can PHP generate it? The chr()
function seems to not generate the proper output. Here's my test script, using the Section Break character U+00A7 (A7 in hex, 167 in decimal, should be represented as C2 A7 in UTF-8) as a test:
<?php
echo "Section sign: ".chr(167)."\n"; // Using CHR function
echo "Section sign: ".chr(0xA7)."\n";
echo "Section sign: ".pack("c", 0xA7)."\n"; // Using pack function?
echo "Section sign: §\n"; // Copy and paste of the symbol into source code
The output I get (via a SSH session to the server) is:
Section sign: ?
Section sign: ?
Section sign: ?
Section sign: §
So, that proves that the terminal font I'm using has the Section Break character in it, and the SSH connection is sending it along successfully, but chr()
isn't constructing it properly when constructing it from the code number.
If all I've got is the code number and not a copy/paste option, what options do I have?
Unicode characters can then be entered by holding down Alt , and typing + on the numeric keypad, followed by the hexadecimal code – using the numeric keypad for digits from 0 to 9 and letter keys for A to F – and then releasing Alt .
PHP does not offer native Unicode support. PHP only supports a 256-character set. However, PHP provides the UTF-8 functions utf8_encode() and utf8_decode() to provide some basic Unicode functionality.
The simplest way to specify a string is to enclose it in single quotes (the character ' ). To specify a literal single quote, escape it with a backslash ( \ ). To specify a literal backslash, double it ( \\ ).
Assuming you have iconv
, here's a simple way that doesn't involve implementing UTF-8 yourself:
function unichr($i) {
return iconv('UCS-4LE', 'UTF-8', pack('V', $i));
}
PHP has no knowledge of Unicode when excluding the mb_ functions and iconv. You'll have to UTF-8 encode the character yourself.
For that, Wikipedia has an excellent overview on how UTF-8 is structured. Here's a quick, dirty and untested function based on that article:
function codepointToUtf8($codepoint)
{
if ($codepoint < 0x7F) // U+0000-U+007F - 1 byte
return chr($codepoint);
if ($codepoint < 0x7FF) // U+0080-U+07FF - 2 bytes
return chr(0xC0 | ($codepoint >> 6)).chr(0x80 | ($codepoint & 0x3F);
if ($codepoint < 0xFFFF) // U+0800-U+FFFF - 3 bytes
return chr(0xE0 | ($codepoint >> 12)).chr(0x80 | (($codepoint >> 6) & 0x3F).chr(0x80 | ($codepoint & 0x3F);
else // U+010000-U+10FFFF - 4 bytes
return chr(0xF0 | ($codepoint >> 18)).chr(0x80 | ($codepoint >> 12) & 0x3F).chr(0x80 | (($codepoint >> 6) & 0x3F).chr(0x80 | ($codepoint & 0x3F);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With