I have my data in this format: U+597D
or like this U+6211
. I want to convert them to UTF-8 (original characters are 好 and 我). How can I do it?
$utf8string = html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", $string), ENT_NOQUOTES, 'UTF-8');
is probably the simplest solution.
function utf8($num) { if($num<=0x7F) return chr($num); if($num<=0x7FF) return chr(($num>>6)+192).chr(($num&63)+128); if($num<=0xFFFF) return chr(($num>>12)+224).chr((($num>>6)&63)+128).chr(($num&63)+128); if($num<=0x1FFFFF) return chr(($num>>18)+240).chr((($num>>12)&63)+128).chr((($num>>6)&63)+128).chr(($num&63)+128); return ''; } function uniord($c) { $ord0 = ord($c{0}); if ($ord0>=0 && $ord0<=127) return $ord0; $ord1 = ord($c{1}); if ($ord0>=192 && $ord0<=223) return ($ord0-192)*64 + ($ord1-128); $ord2 = ord($c{2}); if ($ord0>=224 && $ord0<=239) return ($ord0-224)*4096 + ($ord1-128)*64 + ($ord2-128); $ord3 = ord($c{3}); if ($ord0>=240 && $ord0<=247) return ($ord0-240)*262144 + ($ord1-128)*4096 + ($ord2-128)*64 + ($ord3-128); return false; }
utf8() and uniord() try to mirror the chr() and ord() functions on php:
echo utf8(0x6211)."\n"; echo uniord(utf8(0x6211))."\n"; echo "U+".dechex(uniord(utf8(0x6211)))."\n"; //In your case: $wo='U+6211'; $hao='U+597D'; echo utf8(hexdec(str_replace("U+","", $wo)))."\n"; echo utf8(hexdec(str_replace("U+","", $hao)))."\n";
output:
我 25105 U+6211 我 好
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With