Is there a function that will change UTF-8 to Unicode leaving non special characters as normal letters and numbers?
ie the German word "tchüß" would be rendered as something like "tch\20AC\21AC" (please note that I am making the Unicode codes up).
EDIT: I am experimenting with the following function, but although this one works well with ASCII 32-127, it seems to fail for double byte chars:
function strToHex ($string)
{
$hex = '';
for ($i = 0; $i < mb_strlen ($string, "utf-8"); $i++)
{
$id = ord (mb_substr ($string, $i, 1, "utf-8"));
$hex .= ($id <= 128) ? mb_substr ($string, $i, 1, "utf-8") : "&#" . $id . ";";
}
return ($hex);
}
Any ideas?
EDIT 2: Found solution: The PHP ord() function does not work for double byte chars. Use instead: http://nl.php.net/manual/en/function.ord.php#78032
For a readable-form I would go with JSON. It's not required to escape non-ASCII characters in JSON, but PHP does:
echo json_encode("tchüß");
"tch\u00fc\u00df"
With PHP 7, there is a new IntlChar::ord() to find the Unicode Code Point from a given UTF-8 character:
var_dump(sprintf('U+%04X', IntlChar::ord('ß')));
# Outputs: string(6) "U+00DF"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With