Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert Unicode escape sequences to text in PHP?

I have this Unicode sequence: \u304a\u306f\u3088\u3046\u3054\u3056\u3044\u307e\u3059. How do I convert it into text?

$unicode = '\u304a\u306f\u3088\u3046\u3054\u3056\u3044\u307e\u3059';

I tried:

echo $utf8-decode(unicode);

and I tried:

echo mb_convert_encoding($unicode , 'US-ASCII', 'UTF-8');

and I tried:

echo htmlentities($unicode , ENT_COMPAT, "UTF-8");

but none of these functions convert the sequence into the corresponding Japanese text.

like image 777
learntosucceed Avatar asked Jun 28 '15 08:06

learntosucceed


People also ask

How do I use unicode in escape sequence?

A unicode escape sequence is a backslash followed by the letter 'u' followed by four hexadecimal digits (0-9a-fA-F). It matches a character in the target sequence with the value specified by the four digits. For example, ”\u0041“ matches the target sequence ”A“ when the ASCII character encoding is used.

What is the meaning of \r escape sequence in PHP?

You can achieve the same effect in double-quoted strings by using the escape character, which, in PHP, is a backslash \. Escape sequences, the combination of the escape character \ and a letter, are used to signify that the character after the escape character should be treated specially.

How do I escape characters in PHP?

In PHP, an escape sequence starts with a backslash \ . Escape sequences apply to double-quoted strings. A single-quoted string only uses the escape sequences for a single quote or a backslash.


4 Answers

The issue here is that the string is not unicode. It is an escape sequence used to note down unicode by means of ASCII characters (so 7bit save).

There is a simply trick to use the phps json decoder for this:

<?php
$sequence = '\u304a\u306f\u3088\u3046\u3054\u3056\u3044\u307e\u3059';
print_r(json_decode('["'.$sequence.'"]'));

The output is:

Array
(
    [0] => おはようございます
)

This means you can define a simple convenience function:

<?php
$sequence = '\u304a\u306f\u3088\u3046\u3054\u3056\u3044\u307e\u3059';

function decode($payload) {
  return array_pop(json_decode('["'.$payload.'"]'));
}

echo decode($sequence);

You want to add error handling and escaping of json specific control characters inside the payload. This simply example is just meant to point you into the right direction...

Have fun!

like image 74
arkascha Avatar answered Nov 04 '22 14:11

arkascha


Transliterator class from intl extension can handle the convertion with its predefined Hex-Any identifier:

$in = '\u304a\u306f\u3088\u3046\u3054\u3056\u3044\u307e\u3059';
$out = transliterator_create('Hex-Any')->transliterate($in);
var_dump($out); # string(27) "おはようございます"
like image 38
julp Avatar answered Nov 04 '22 12:11

julp


$unicode = '\u304a\u306f\u3088\u3046\u3054\u3056\u3044\u307e\u3059';
$json = sprintf('"%s"',$unicode); # build json string

$utf8_str = json_decode ( $json, true ); # json decode
echo $utf8_str; # おはようございます

See Json string

enter image description here

like image 33
PHPJungle Avatar answered Nov 04 '22 14:11

PHPJungle


PHP 7+

As of PHP 7, you can use the Unicode codepoint escape syntax to do this.

echo "\u{304a}\u{306f}\u{3088}\u{3046}\u{3054}\u{3056}\u{3044}\u{307e}\u{3059}"; outputs おはようございます.

like image 38
Rabin Lama Dong Avatar answered Nov 04 '22 12:11

Rabin Lama Dong