Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert unicode to html entities hex

How to convert a Unicode string to HTML entities? (HEX not decimal)

For example, convert Français to Français.

like image 275
mrdaliri Avatar asked Nov 07 '12 23:11

mrdaliri


3 Answers

For the missing hex-encoding in the related question:

$output = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) {
    list($utf8) = $match;
    $binary = mb_convert_encoding($utf8, 'UTF-32BE', 'UTF-8');
    $entity = vsprintf('&#x%X;', unpack('N', $binary));
    return $entity;
}, $input);

This is similar to @Baba's answer using UTF-32BE and then unpack and vsprintf for the formatting needs.

If you prefer iconv over mb_convert_encoding, it's similar:

$output = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($match) {
    list($utf8) = $match;
    $binary = iconv('UTF-8', 'UTF-32BE', $utf8);
    $entity = vsprintf('&#x%X;', unpack('N', $binary));
    return $entity;
}, $input);

I find this string manipulation a bit more clear then in Get hexcode of html entities.

like image 64
hakre Avatar answered Oct 01 '22 02:10

hakre


Your string looks like UCS-4 encoding you can try

$first = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($m) {
    $char = current($m);
    $utf = iconv('UTF-8', 'UCS-4', $char);
    return sprintf("&#x%s;", ltrim(strtoupper(bin2hex($utf)), "0"));
}, $string);

Output

string 'Français' (length=13)
like image 23
Baba Avatar answered Oct 01 '22 03:10

Baba


Firstly, when I faced this problem recently, I solved it by making sure my code-files, DB connection, and DB tables were all UTF-8 Then, simply echoing the text works. If you must escape the output from the DB use htmlspecialchars() and not htmlentities() so that the UTF-8 symbols are left alone and not attempted to be escaped.

Would like to document an alternative solution because it solved a similar problem for me. I was using PHP's utf8_encode() to escape 'special' characters.

I wanted to convert them into HTML entities for display, I wrote this code because I wanted to avoid iconv or such functions as far as possible since not all environments necessarily have them (do correct me if it is not so!)

$foo = 'This is my test string \u03b50';
echo unicode2html($foo);

function unicode2html($string) {
    return preg_replace('/\\\\u([0-9a-z]{4})/', '&#x$1;', $string);
}

Hope this helps somebody in need :-)

like image 20
Angad Avatar answered Oct 01 '22 04:10

Angad