Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you print raw UTF-8 characters from their numbers?

Tags:

php

unicode

utf-8

Say I wanted to print a ÿ (latin small y with diaeresis) from its Unicode/UTF-8 number of U+00FF or hex of c3 bf. How can I do that in PHP?

The reason is that I need to be able to create certain UTF-8 Characters is for testing in my regex and string functions. However, since I have less than 200 keys on my keyboard I can't type them - and since many times I am stuck in an ASCII only world - I need to be able to create them bases solely off of their ASCII safe, UTF-8 character code.

Note: In order for it show correctly in a browser I know that the first step is

header('Content-Type: text/html; charset=utf-8');
like image 250
Xeoncross Avatar asked May 01 '10 00:05

Xeoncross


People also ask

Does UTF-8 include numbers?

UTF-8 treats numbers 0-127 as ASCII, 192-247 as Shift keys, and 128-192 as the key to be shifted. For instance, characters 208 and 209 shift you into the Cyrillic range. 208 followed by 175 is character 1071, the Cyrillic Я.

How do I identify a UTF-8 character?

If our byte is positive (8th bit set to 0), this mean that it's an ASCII character. if ( myByte >= 0 ) return myByte; Codes greater than 127 are encoded into several bytes. On the other hand, if our byte is negative, this means that it's probably an UTF-8 encoded character whose code is greater than 127.

Can UTF-8 represent all characters?

Each UTF can represent any Unicode character that you need to represent. UTF-8 is based on 8-bit code units. Each character is encoded as 1 to 4 bytes. The first 128 Unicode code points are encoded as 1 byte in UTF-8.

Can UTF-8 be read as ASCII?

Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character. The first 128 UTF-8 characters precisely match the first 128 ASCII characters (numbered 0-127), meaning that existing ASCII text is already valid UTF-8.


2 Answers

well you have everything you need.
Hex values being recognized in double-quoted strings as well

echo "\xc3\xbf";
like image 79
Your Common Sense Avatar answered Oct 05 '22 23:10

Your Common Sense


Solution 1 with a small pack function

<?php

function chr_utf8($n,$f='C*'){
return $n<(1<<7)?chr($n):($n<1<<11?pack($f,192|$n>>6,1<<7|191&$n):
($n<(1<<16)?pack($f,224|$n>>12,1<<7|63&$n>>6,1<<7|63&$n):
($n<(1<<20|1<<16)?pack($f,240|$n>>18,1<<7|63&$n>>12,1<<7|63&$n>>6,1<<7|63&$n):'')));
}

echo chr_utf8(9405).chr_utf8(9402).chr_utf8(9409).chr_utf8(9409).chr_utf8(9412);

//Output ⒽⒺⓁⓁⓄ

Check it in https://eval.in/748062 …

Solution 2 with json_decode

<?php

$utf8_char='["';
for($number=0;$number<55296;$number++)
$utf8_char.='\u'.substr('000'.strtoupper(dechex($number)),-4).'","';
$utf8_char=json_decode(substr($utf8_char,0,-2).']');

echo $utf8_char[9405].$utf8_char[9402].$utf8_char[9409].$utf8_char[9409].$utf8_char[9412];

//Output ⒽⒺⓁⓁⓄ
like image 33
Php'Regex Avatar answered Oct 06 '22 01:10

Php'Regex