Say I wanted to print a ÿ
(latin small y with diaeresis) from its Unicode/UTF-8 number of U+00FF
or hex of c3 bf
. How can I do that in PHP?
The reason is that I need to be able to create certain UTF-8 Characters is for testing in my regex and string functions. However, since I have less than 200 keys on my keyboard I can't type them - and since many times I am stuck in an ASCII only world - I need to be able to create them bases solely off of their ASCII safe, UTF-8 character code.
Note: In order for it show correctly in a browser I know that the first step is
header('Content-Type: text/html; charset=utf-8');
UTF-8 treats numbers 0-127 as ASCII, 192-247 as Shift keys, and 128-192 as the key to be shifted. For instance, characters 208 and 209 shift you into the Cyrillic range. 208 followed by 175 is character 1071, the Cyrillic Я.
If our byte is positive (8th bit set to 0), this mean that it's an ASCII character. if ( myByte >= 0 ) return myByte; Codes greater than 127 are encoded into several bytes. On the other hand, if our byte is negative, this means that it's probably an UTF-8 encoded character whose code is greater than 127.
Each UTF can represent any Unicode character that you need to represent. UTF-8 is based on 8-bit code units. Each character is encoded as 1 to 4 bytes. The first 128 Unicode code points are encoded as 1 byte in UTF-8.
Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character. The first 128 UTF-8 characters precisely match the first 128 ASCII characters (numbered 0-127), meaning that existing ASCII text is already valid UTF-8.
well you have everything you need.
Hex values being recognized in double-quoted strings as well
echo "\xc3\xbf";
Solution 1 with a small pack function
<?php
function chr_utf8($n,$f='C*'){
return $n<(1<<7)?chr($n):($n<1<<11?pack($f,192|$n>>6,1<<7|191&$n):
($n<(1<<16)?pack($f,224|$n>>12,1<<7|63&$n>>6,1<<7|63&$n):
($n<(1<<20|1<<16)?pack($f,240|$n>>18,1<<7|63&$n>>12,1<<7|63&$n>>6,1<<7|63&$n):'')));
}
echo chr_utf8(9405).chr_utf8(9402).chr_utf8(9409).chr_utf8(9409).chr_utf8(9412);
//Output ⒽⒺⓁⓁⓄ
Check it in https://eval.in/748062 …
Solution 2 with json_decode
<?php
$utf8_char='["';
for($number=0;$number<55296;$number++)
$utf8_char.='\u'.substr('000'.strtoupper(dechex($number)),-4).'","';
$utf8_char=json_decode(substr($utf8_char,0,-2).']');
echo $utf8_char[9405].$utf8_char[9402].$utf8_char[9409].$utf8_char[9409].$utf8_char[9412];
//Output ⒽⒺⓁⓁⓄ
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With