Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert Indic Characters to Unicode Escaped characters

I am currently designing a mobile application for Android. The text and content are in the Local Indic Language, Tamil. For Welcome, the equivalent of Tamil translation is: வணக்கம். Since Android cannot display Indic Text, I am converting it using a service called JavaScript String Escape.

So this works in this way:

  • Input: வணக்கம்
  • Output: \u0BB5\u0BA3\u0B95\u0BCD\u0B95\u0BAE\u0BCD

How can I make this using JavaScript or PHP as I have huge loads of text to be converted and made it into JSON. Sample JSON:

{
  "title": "\u0BAE\u0BB0\u0BC1\u0BA4\u0BCD\u0BA4\u0BC1\u0BB5\u0BB0\u0BBF\u0BA9\u0BCD \u0BAA\u0BC6\u0BAF\u0BB0\u0BCD #1",
  "image": "http://www.exceptnothing.com/doctors/doc11.png",
  "rating": "\u2713 \u0B87\u0BAA\u0BCD\u0BAA\u0BC7\u0BBE\u0BA4\u0BC1 \u0BAA\u0BBE\u0BB0\u0BCD\u0B95\u0BCD\u0B95\u0BB2\u0BBE\u0BAE\u0BCD",
  "rating2": "",
  "releaseYear": "\u0BA8\u0BBE\u0BB3\u0BCD \u0BAE\u0BC1\u0BB4\u0BC1\u0BB5\u0BA4\u0BC1\u0BAE\u0BCD \u0BAA\u0BBE\u0BB0\u0BCD\u0B95\u0BCD\u0B95\u0BB2\u0BBE\u0BAE\u0BCD",
  "genre": ["\u25B6 \u0B87\u0BA4\u0BAF \u0BA8\u0BBF\u0BAA\u0BC1\u0BA3\u0BB0\u0BCD"]
}

I also would like to know how to decode the above JSON and show it as வணக்கம். Thanks in advance.

like image 456
Anirudh M Avatar asked Jan 21 '16 10:01

Anirudh M


People also ask

What is escape Unicode?

A unicode escape sequence is a backslash followed by the letter 'u' followed by four hexadecimal digits (0-9a-fA-F). It matches a character in the target sequence with the value specified by the four digits. For example, ”\u0041“ matches the target sequence ”A“ when the ASCII character encoding is used.

What is Unicode JavaScript?

Unicode is a universal character set that defines the list of characters from the majority of the writing systems, and associates for every character a unique number (code point).

Are JavaScript string Unicode?

In Javascript, the identifiers and string literals can be expressed in Unicode via a Unicode escape sequence. The general syntax is \uXXXX , where X denotes four hexadecimal digits. For example, the letter o is denoted as '\u006F' in Unicode.


1 Answers

What you are looking for is escape() in JavaScript and json_encode() in PHP. Open up your console and type the following:

escape("வணக்கம்")

And you will get the following back:

"%u0BB5%u0BA3%u0B95%u0BCD%u0B95%u0BAE%u0BCD"

So the first one is solved. To get back the original வணக்கம் from the above one, use unescape():

unescape("%u0BB5%u0BA3%u0B95%u0BCD%u0B95%u0BAE%u0BCD");

Note: One thing to be noted is, both escape() and unescape() are deprecated. So you need to use encodeURIComponent and decodeURIComponent

Preview

Update for Server Side

For encoding and decoding into JSON, it is better for you to use the PHP's built-in function. The same escape() can also be used in PHP as json_encode(), they both give the same result.

json_encode("வணக்கம்");
=> "%u0BB5%u0BA3%u0B95%u0BCD%u0B95%u0BAE%u0BCD"

Also, see JavaScript: Escaping Special Characters for more information. Hope this helps. :)

like image 177
Praveen Kumar Purushothaman Avatar answered Sep 23 '22 14:09

Praveen Kumar Purushothaman