Is there a function in PHP that can decode Unicode escape sequences like "\u00ed
" to "í
" and all other similar occurrences?
I found similar question here but is doesn't seem to work.
A unicode escape sequence is a backslash followed by the letter 'u' followed by four hexadecimal digits (0-9a-fA-F). It matches a character in the target sequence with the value specified by the four digits. For example, ”\u0041“ matches the target sequence ”A“ when the ASCII character encoding is used.
The Difference Between Unicode and UTF-8Unicode is a character set. UTF-8 is encoding. Unicode is a list of characters with unique decimal numbers (code points).
UTF-8 is an encoding system for Unicode. It can translate any Unicode character to a matching unique binary string, and can also translate the binary string back to a Unicode character. This is the meaning of “UTF”, or “Unicode Transformation Format.”
Defined by the Unicode Standard, the name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8-bit. UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte (8-bit) code units.
Try this:
$str = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) { return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE'); }, $str);
In case it's UTF-16 based C/C++/Java/Json-style:
$str = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/', function ($match) { return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UTF-16BE'); }, $str);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With