Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert from "Java Escape" to Index in PHP

Tags:

php

unicode

utf-8

are there any form to convert a string in Java Escape to Index unicode in PHP?

I have this string:

$ str = "\ud83d\ude0e";

And I need obtain the portion after U+:

U+1F60E 

Or the python code:

u'\U0001f60e'

The correspondence codes: http://www.charbase.com/1f60e-unicode-smiling-face-with-sunglasses

Thank you.

==== EDIT 09/03 ====

Sorry for my delay and thank you for your reply, but I´m not able to do what I need.

I need to replace the caracter with an image, so I do:

$src = "Hello "."\ud83d\ude0e";

$replaced = preg_replace("/\\\\u([0-9A-F]{1,8})/i", "&#x$1;", $src);

$replaced = str_replace('&#x1f60e', '<img src="data/emoji_new/1F60E.png">', $replaced);

$result = mb_convert_encoding($replaced, "UTF-8", "HTML-ENTITIES");

But, not work.. The result is:

"Hello ��"

Any more idea??

Thank you again!

like image 301
Randolf Avatar asked Feb 28 '13 18:02

Randolf


1 Answers

Very similar to PHP: Convert unicode codepoint to UTF-8

Going straight from the 4 byte char if you can.

$src = "Hello \u0001f60e";

$replaced = preg_replace("/\\\\u([0-9A-F]{1,8})/i", "&#x$1;", $src);

$result = mb_convert_encoding($replaced, "UTF-8", "HTML-ENTITIES");

echo "Result is [$result] and string length is ".mb_strlen($result);

Outputs something that is almost certainly not going to display properly in most people's browser.

Result is [Hello 😎] and string length is 10

Or from the two UTF-16 codes:

$src = "Hello "."\ud83d\ude0e";

$replaced = preg_replace("/\\\\u([0-9A-F]{1,4})/i", "&#x$1;", $src);

$result = mb_convert_encoding($replaced, "UTF-16", "HTML-ENTITIES");

$result = mb_convert_encoding($result, 'utf-8', 'utf-16');

echo "Result is [$result] and string length is ".mb_strlen($result)."\n";

$resultInHex = unpack('H*', $result);

$resultInHex = $resultInHex[1];

$resultSeparated = implode(', ', str_split($resultInHex, 2));

echo "in hex: ".$resultSeparated;

Outputs:

Result is [Hello 😎] and string length is 10
in hex: 48, 65, 6c, 6c, 6f, 20, f0, 9f, 98, 8e

For everyone who is wondering 'What is Java escape?', Java encodes all characters as UTF-16 internally.

like image 66
Danack Avatar answered Oct 03 '22 12:10

Danack