Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP: Convert unicode codepoint to UTF-8

Tags:

php

unicode

utf-8

I have my data in this format: U+597D or like this U+6211. I want to convert them to UTF-8 (original characters are 好 and 我). How can I do it?

like image 272
Anthony Avatar asked Nov 26 '09 21:11

Anthony


2 Answers

$utf8string = html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", $string), ENT_NOQUOTES, 'UTF-8'); 

is probably the simplest solution.

like image 66
Mez Avatar answered Sep 20 '22 09:09

Mez


function utf8($num) {     if($num<=0x7F)       return chr($num);     if($num<=0x7FF)      return chr(($num>>6)+192).chr(($num&63)+128);     if($num<=0xFFFF)     return chr(($num>>12)+224).chr((($num>>6)&63)+128).chr(($num&63)+128);     if($num<=0x1FFFFF)   return chr(($num>>18)+240).chr((($num>>12)&63)+128).chr((($num>>6)&63)+128).chr(($num&63)+128);     return ''; }  function uniord($c) {     $ord0 = ord($c{0}); if ($ord0>=0   && $ord0<=127) return $ord0;     $ord1 = ord($c{1}); if ($ord0>=192 && $ord0<=223) return ($ord0-192)*64 + ($ord1-128);     $ord2 = ord($c{2}); if ($ord0>=224 && $ord0<=239) return ($ord0-224)*4096 + ($ord1-128)*64 + ($ord2-128);     $ord3 = ord($c{3}); if ($ord0>=240 && $ord0<=247) return ($ord0-240)*262144 + ($ord1-128)*4096 + ($ord2-128)*64 + ($ord3-128);     return false; } 

utf8() and uniord() try to mirror the chr() and ord() functions on php:

echo utf8(0x6211)."\n"; echo uniord(utf8(0x6211))."\n"; echo "U+".dechex(uniord(utf8(0x6211)))."\n";  //In your case: $wo='U+6211'; $hao='U+597D'; echo utf8(hexdec(str_replace("U+","", $wo)))."\n"; echo utf8(hexdec(str_replace("U+","", $hao)))."\n"; 

output:

我 25105 U+6211 我 好 
like image 29
velcrow Avatar answered Sep 20 '22 09:09

velcrow