Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Print Unicode characters PHP

I have a database which stores video game names with Unicode characters but I can't figure out how to properly escape these Unicode characters when printing them to an HTML response.

For instance, when I print all games with the name like Uncharted, I get this:

Uncharted: Drake's Fortuneâ„¢
Uncharted 2: Among Thievesâ„¢
Uncharted 3: Drake's Deceptionâ„¢

but it should display this:

Uncharted: Drake's Fortune™
Uncharted 2: Among Thieves™
Uncharted 3: Drake's Deception™

I ran a quick JavaScript escape function to see which Unicode character the is and found that it's \u2122.

I don't have a problem fully escaping every character in the string if I can get the character to display correctly. My guess is to somehow find the hex representation of each character in the string and have PHP render the Unicode characters like this:

print "&#x2122";

Please guide me through the best approach for Unicode escaping a string for being HTML friendly. I've done something similar for JavaScript a while back, but JavaScript has a built in function for escape and unescape.

I'm not aware of any PHP functions of similar functionality however. I have read about the ord function, but it just returns the ASCII character code for a given character, hence the improper display of the ™ or the ™. I would like this function to be versatile enough to apply to any string containing valid Unicode characters.

like image 518
Cameron Tinker Avatar asked Jul 09 '13 03:07

Cameron Tinker


People also ask

How to print Unicode characters in PHP?

The best way is to tell the browser that UTF-8 is being used by sending the corresponding HTTP header: header("content-type: text/html; charset=UTF-8"); Then, you can leave the rest of your code as-is and don't have to html-encode entities or create other mess.

Does PHP use Unicode?

PHP does not offer native Unicode support. PHP only supports a 256-character set. However, PHP provides the UTF-8 functions utf8_encode() and utf8_decode() to provide some basic Unicode functionality. See the PHP manual for strings for more details about PHP and Unicode.

How do I find Unicode?

To insert a Unicode character, type the character code, press ALT, and then press X. For example, to type a dollar symbol ($), type 0024, press ALT, and then press X.


2 Answers

// PHP 7.0
var_dump(
    IntlChar::chr(0x2122),
    IntlChar::chr(0x1F638)
);

var_dump(
    utf8_chr(0x2122),
    utf8_chr(0x1F638)
);

function utf8_chr($cp) {

    if (!is_int($cp)) {
        exit("$cp is not integer\n");
    }

    // UTF-8 prohibits characters between U+D800 and U+DFFF
    // https://tools.ietf.org/html/rfc3629#section-3
    //
    // Q: Are there any 16-bit values that are invalid?
    // http://unicode.org/faq/utf_bom.html#utf16-7

    if ($cp < 0 || (0xD7FF < $cp && $cp < 0xE000) || 0x10FFFF < $cp) {
        exit("$cp is out of range\n");
    }

    if ($cp < 0x10000) {
        return json_decode('"\u'.bin2hex(pack('n', $cp)).'"');
    }

    // Q: Isn’t there a simpler way to do this?
    // http://unicode.org/faq/utf_bom.html#utf16-4
    $lead = 0xD800 - (0x10000 >> 10) + ($cp >> 10);
    $trail = 0xDC00 + ($cp & 0x3FF);

    return json_decode('"\u'.bin2hex(pack('n', $lead)).'\u'.bin2hex(pack('n', $trail)).'"');
}
like image 179
masakielastic Avatar answered Sep 21 '22 12:09

masakielastic


I spent a lot of time trying to find the better way to just print the equivalent char of an unicode code, and the methods I found didn't work or it just were very complicated.

This said, JSON is able to represent unicode characters using the syntax "\u[unicode_code]", then:

echo json_decode('"\u00e1"'); 

Will print the equivalent unicode char, in this case: á.

P.D. Note the simple and the double quotes. If you don't put both it won't work.

like image 42
sh4 Avatar answered Sep 23 '22 12:09

sh4