Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shouldn't JSON.stringify escape Unicode characters?

I have a simple test page in UTF-8 where text with letters in multiple different languages gets stringified to JSON:

http://jsfiddle.net/Mhgy5/

HTML:

<textarea id="txt">
検索 • Busca • Sök • 搜尋 • Tìm kiếm • Пошук • Cerca • Søk • Haku • Hledání • Keresés • 찾기 • Cari • Ara • جستجو • Căutare • بحث • Hľadať • Søg • Serĉu • Претрага • Paieška • Poišči • Cari • חיפוש • Търсене • Іздеу • Bilatu • Suk • Bilnga • Traži • खोजें
</textarea>
<button id="encode">Encode</button>
<pre id="out">
</pre>

JavaScript:

​$("#encode").click(function () {
    $("#out").text(JSON.stringify({ txt: $("#txt").val() }));
}).click();
​

While I expect the non-ASCII characters to be escaped as \uXXXX as per the JSON spec, they seem to be untouched. Here's the output I get from the above test:

{"txt":"検索 • Busca • Sök • 搜尋 • Tìm kiếm • Пошук • Cerca • Søk • Haku • Hledání • Keresés • 찾기 • Cari • Ara • جستجو • Căutare • بحث • Hľadať • Søg • Serĉu • Претрага • Paieška • Poišči • Cari • חיפוש • Търсене • Іздеу • Bilatu • Suk • Bilnga • Traži • खोजें\n"}

I'm using Chrome, so it should be the native JSON.stringify implementation. The page's encoding is UTF-8. Shouldn't the non-ASCII characters be escaped?

What brought me to this test in the first place is, I noticed that jQuery.ajax doesn't seem to escape non-ASCII characters when they appear in a data object property. The characters seem to be transmitted as UTF-8.

like image 629
Ates Goral Avatar asked Sep 04 '12 21:09

Ates Goral


People also ask

How do I escape a Unicode character in JSON?

Escapes characters of a UTF-8 encoded Unicode string using JSON-style escape sequences. The escaping rules are as follows, in priority order: If the code point is the double quote (0x22), it is escaped as \" (backslash double quote). If the code point is the backslash (0x5C), it is escaped as \\ (double backslash).

Does JSON Stringify escape quotes?

JSON. stringify does not act like an "identity" function when called on data that has already been converted to JSON. By design, it will escape quote marks, backslashes, etc. You need to call JSON.

Is Unicode valid in JSON?

JSON data always uses the Unicode character set. In this respect, JSON data is simpler to use than XML data. This is an important part of the JSON Data Interchange Format (RFC 4627).

How do you escape Unicode characters?

A unicode escape sequence is a backslash followed by the letter 'u' followed by four hexadecimal digits (0-9a-fA-F). It matches a character in the target sequence with the value specified by the four digits. For example, ”\u0041“ matches the target sequence ”A“ when the ASCII character encoding is used.


1 Answers

Indeed JSON.stringify does not escape utf8:

JSON.stringify({a:"Привет!"})
{"a":"Привет!"}

But I had an issue when stroring that JSON via Perl DBD::Mysql and then retrieving it back. I found it is safer to follow reccomendation to escape all non-ascii and non-visible characters by \uXXXX. Here is how

function jsonEscapeUTF(s) {return s.replace(/[^\x20-\x7F]/g, x => "\\u" + ("000"+x.codePointAt(0).toString(16)).slice(-4))}

jsonEscapeUTF(JSON.stringify({a:"Привет!"}))
"{"a":"\u041f\u0440\u0438\u0432\u0435\u0442!"}"

Hopefully it will be helpful.

like image 132
okharch Avatar answered Sep 16 '22 15:09

okharch