I have a database which stores video game names with Unicode characters but I can't figure out how to properly escape these Unicode characters when printing them to an HTML response.
For instance, when I print all games with the name like Uncharted, I get this:
Uncharted: Drake's Fortuneâ„¢
Uncharted 2: Among Thievesâ„¢
Uncharted 3: Drake's Deceptionâ„¢
but it should display this:
Uncharted: Drake's Fortune™
Uncharted 2: Among Thieves™
Uncharted 3: Drake's Deception™
I ran a quick JavaScript escape function to see which Unicode character the ™
is and found that it's \u2122
.
I don't have a problem fully escaping every character in the string if I can get the ™
character to display correctly. My guess is to somehow find the hex representation of each character in the string and have PHP render the Unicode characters like this:
print "™";
Please guide me through the best approach for Unicode escaping a string for being HTML friendly. I've done something similar for JavaScript a while back, but JavaScript has a built in function for escape and unescape.
I'm not aware of any PHP functions of similar functionality however. I have read about the ord function, but it just returns the ASCII character code for a given character, hence the improper display of the ™
or the ™
. I would like this function to be versatile enough to apply to any string containing valid Unicode characters.
The best way is to tell the browser that UTF-8 is being used by sending the corresponding HTTP header: header("content-type: text/html; charset=UTF-8"); Then, you can leave the rest of your code as-is and don't have to html-encode entities or create other mess.
PHP does not offer native Unicode support. PHP only supports a 256-character set. However, PHP provides the UTF-8 functions utf8_encode() and utf8_decode() to provide some basic Unicode functionality. See the PHP manual for strings for more details about PHP and Unicode.
To insert a Unicode character, type the character code, press ALT, and then press X. For example, to type a dollar symbol ($), type 0024, press ALT, and then press X.
// PHP 7.0
var_dump(
IntlChar::chr(0x2122),
IntlChar::chr(0x1F638)
);
var_dump(
utf8_chr(0x2122),
utf8_chr(0x1F638)
);
function utf8_chr($cp) {
if (!is_int($cp)) {
exit("$cp is not integer\n");
}
// UTF-8 prohibits characters between U+D800 and U+DFFF
// https://tools.ietf.org/html/rfc3629#section-3
//
// Q: Are there any 16-bit values that are invalid?
// http://unicode.org/faq/utf_bom.html#utf16-7
if ($cp < 0 || (0xD7FF < $cp && $cp < 0xE000) || 0x10FFFF < $cp) {
exit("$cp is out of range\n");
}
if ($cp < 0x10000) {
return json_decode('"\u'.bin2hex(pack('n', $cp)).'"');
}
// Q: Isn’t there a simpler way to do this?
// http://unicode.org/faq/utf_bom.html#utf16-4
$lead = 0xD800 - (0x10000 >> 10) + ($cp >> 10);
$trail = 0xDC00 + ($cp & 0x3FF);
return json_decode('"\u'.bin2hex(pack('n', $lead)).'\u'.bin2hex(pack('n', $trail)).'"');
}
I spent a lot of time trying to find the better way to just print the equivalent char of an unicode code, and the methods I found didn't work or it just were very complicated.
This said, JSON is able to represent unicode characters using the syntax "\u[unicode_code]", then:
echo json_decode('"\u00e1"');
Will print the equivalent unicode char, in this case: á.
P.D. Note the simple and the double quotes. If you don't put both it won't work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With