Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP construct a Unicode string?

Tags:

php

unicode

Given a Unicode decimal or hex number for a character that's wanting to be output from a CLI PHP script, how can PHP generate it? The chr() function seems to not generate the proper output. Here's my test script, using the Section Break character U+00A7 (A7 in hex, 167 in decimal, should be represented as C2 A7 in UTF-8) as a test:

<?php
echo "Section sign: ".chr(167)."\n"; // Using CHR function
echo "Section sign: ".chr(0xA7)."\n";
echo "Section sign: ".pack("c", 0xA7)."\n"; // Using pack function?
echo "Section sign: §\n"; // Copy and paste of the symbol into source code

The output I get (via a SSH session to the server) is:

Section sign: ?
Section sign: ?
Section sign: ?
Section sign: §

So, that proves that the terminal font I'm using has the Section Break character in it, and the SSH connection is sending it along successfully, but chr() isn't constructing it properly when constructing it from the code number.

If all I've got is the code number and not a copy/paste option, what options do I have?

like image 500
MidnightLightning Avatar asked Sep 13 '10 21:09

MidnightLightning


People also ask

How do I create a Unicode?

Unicode characters can then be entered by holding down Alt , and typing + on the numeric keypad, followed by the hexadecimal code – using the numeric keypad for digits from 0 to 9 and letter keys for A to F – and then releasing Alt .

Does PHP support Unicode?

PHP does not offer native Unicode support. PHP only supports a 256-character set. However, PHP provides the UTF-8 functions utf8_encode() and utf8_decode() to provide some basic Unicode functionality.

How to specify string in PHP?

The simplest way to specify a string is to enclose it in single quotes (the character ' ). To specify a literal single quote, escape it with a backslash ( \ ). To specify a literal backslash, double it ( \\ ).


2 Answers

Assuming you have iconv, here's a simple way that doesn't involve implementing UTF-8 yourself:

function unichr($i) {
    return iconv('UCS-4LE', 'UTF-8', pack('V', $i));
}
like image 72
bobince Avatar answered Sep 23 '22 02:09

bobince


PHP has no knowledge of Unicode when excluding the mb_ functions and iconv. You'll have to UTF-8 encode the character yourself.

For that, Wikipedia has an excellent overview on how UTF-8 is structured. Here's a quick, dirty and untested function based on that article:

function codepointToUtf8($codepoint)
{
    if ($codepoint < 0x7F) // U+0000-U+007F - 1 byte
        return chr($codepoint);
    if ($codepoint < 0x7FF) // U+0080-U+07FF - 2 bytes
        return chr(0xC0 | ($codepoint >> 6)).chr(0x80 | ($codepoint & 0x3F);
    if ($codepoint < 0xFFFF) // U+0800-U+FFFF - 3 bytes
        return chr(0xE0 | ($codepoint >> 12)).chr(0x80 | (($codepoint >> 6) & 0x3F).chr(0x80 | ($codepoint & 0x3F);
    else // U+010000-U+10FFFF - 4 bytes
        return chr(0xF0 | ($codepoint >> 18)).chr(0x80 | ($codepoint >> 12) & 0x3F).chr(0x80 | (($codepoint >> 6) & 0x3F).chr(0x80 | ($codepoint & 0x3F);
}
like image 38
Michael Madsen Avatar answered Sep 22 '22 02:09

Michael Madsen