I have this Unicode sequence: \u304a\u306f\u3088\u3046\u3054\u3056\u3044\u307e\u3059
. How do I convert it into text?
$unicode = '\u304a\u306f\u3088\u3046\u3054\u3056\u3044\u307e\u3059';
I tried:
echo $utf8-decode(unicode);
and I tried:
echo mb_convert_encoding($unicode , 'US-ASCII', 'UTF-8');
and I tried:
echo htmlentities($unicode , ENT_COMPAT, "UTF-8");
but none of these functions convert the sequence into the corresponding Japanese text.
A unicode escape sequence is a backslash followed by the letter 'u' followed by four hexadecimal digits (0-9a-fA-F). It matches a character in the target sequence with the value specified by the four digits. For example, ”\u0041“ matches the target sequence ”A“ when the ASCII character encoding is used.
You can achieve the same effect in double-quoted strings by using the escape character, which, in PHP, is a backslash \. Escape sequences, the combination of the escape character \ and a letter, are used to signify that the character after the escape character should be treated specially.
In PHP, an escape sequence starts with a backslash \ . Escape sequences apply to double-quoted strings. A single-quoted string only uses the escape sequences for a single quote or a backslash.
The issue here is that the string is not unicode. It is an escape sequence used to note down unicode by means of ASCII characters (so 7bit save).
There is a simply trick to use the phps json decoder for this:
<?php
$sequence = '\u304a\u306f\u3088\u3046\u3054\u3056\u3044\u307e\u3059';
print_r(json_decode('["'.$sequence.'"]'));
The output is:
Array
(
[0] => おはようございます
)
This means you can define a simple convenience function:
<?php
$sequence = '\u304a\u306f\u3088\u3046\u3054\u3056\u3044\u307e\u3059';
function decode($payload) {
return array_pop(json_decode('["'.$payload.'"]'));
}
echo decode($sequence);
You want to add error handling and escaping of json specific control characters inside the payload. This simply example is just meant to point you into the right direction...
Have fun!
Transliterator class from intl extension can handle the convertion with its predefined Hex-Any identifier:
$in = '\u304a\u306f\u3088\u3046\u3054\u3056\u3044\u307e\u3059';
$out = transliterator_create('Hex-Any')->transliterate($in);
var_dump($out); # string(27) "おはようございます"
$unicode = '\u304a\u306f\u3088\u3046\u3054\u3056\u3044\u307e\u3059';
$json = sprintf('"%s"',$unicode); # build json string
$utf8_str = json_decode ( $json, true ); # json decode
echo $utf8_str; # おはようございます
See Json string
As of PHP 7, you can use the Unicode codepoint escape syntax to do this.
echo "\u{304a}\u{306f}\u{3088}\u{3046}\u{3054}\u{3056}\u{3044}\u{307e}\u{3059}";
outputs おはようございます
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With