I have a bunch of documents persisted in Apache Lucene with some names in russian, and when I'm trying to print them out it looks like this "\u0410\u0441\u043f\u0430\u0440"
, but not in cyrillic symbols. The project is in Scala. I've tried to fix this with Apache Commons unescapeJava
method, but it didn't help. Are there any other options?
Updated: Project is writen with Spray framework and returns json like this.
{
"id" : 0,
"name" : "\u0410\u0441\u043f\u0430\u0440"
}
I'm going to try to infer exactly what you are doing. You are using Spray, so I gather that you are using its json library "spray-json"
So I suppose that you have some instance of spray.json.JsObject
, and that what you posted in your question is what you get as the output when printing this instance.
Your json object is correct, the value of the name
field has no embeded escaping, it is actually the conversion to string that escapes some unicode characters.
See the definition of printString
here:
https://github.com/spray/spray-json/blob/master/src/main/scala/spray/json/JsonPrinter.scala
I will also assume that when you tried to use unescapeJava
, you applied it on the value of the name
field, creating a new spray.json.JsObject
instance that you then printed as before. Given that your json object does not actually have any escaping, this did absolutly nothing, and then when printing it the printer does the escaping as before, and you're back to square one.
As a side note, it's worth mentioning that the json spec does not mandate how characters are encoded: they can either be stored as their literal value, or as a unicode escape. By example the string "abc"
could be described as just "abc"
, or as "\u0061\u0062\u0063"
. Either form is correct. It just happens that the author of spray-json decided to use the latter form for all non-ascii characters.
So now you ask, what can I do to work around this? You could ask the spray-json author to add an option that let's you specify that you don't want any unicode escaping. But I imagine that you want a solution right now.
The simplest thing to do is to just convert your object to a string (via JsValue.toString
or JsValue.compactPrint
or JsValue.prettyPrint
), and then pass the result to unescapeJava
. At least this will give you back your cyrillic original characters.
But this is a bit gross, and actually quite dangerous as some characters are not safe to unescape inside a string literal. By example: \n
will be unescaped to an actual return, and \u0022
will be unescaped to "
. You can easily see how it will break your json document.
But at the very least it will allow to confirm my theory (remember that I have been making assumptions about what exactly you are doing).
Now for a proper fix: you could simply extend JsonPrinter
and override its printString
method to remove the unicode escapting. Something like this (untested):
trait NoUnicodeEscJsonPrinter extends JsonPrinter {
override protected def printString(s: String, sb: StringBuilder) {
@tailrec
def printEscaped(s: String, ix: Int) {
if (ix < s.length) {
s.charAt(ix) match {
case '"' => sb.append("\\\"")
case '\\' => sb.append("\\\\")
case x if 0x20 <= x && x < 0x7F => sb.append(x)
case '\b' => sb.append("\\b")
case '\f' => sb.append("\\f")
case '\n' => sb.append("\\n")
case '\r' => sb.append("\\r")
case '\t' => sb.append("\\t")
case x => sb.append(x)
}
printEscaped(s, ix + 1)
}
}
sb.append('"')
printEscaped(s, 0)
sb.append('"')
}
}
trait NoUnicodeEscPrettyPrinter extends PrettyPrinter with NoUnicodeEscJsonPrinter
object NoUnicodeEscPrettyPrinter extends NoUnicodeEscPrettyPrinter
trait NoUnicodeEscCompactPrinter extends CompactPrinter with NoUnicodeEscJsonPrinter
object NoUnicodeEscCompactPrinter extends NoUnicodeEscCompactPrinter
Then you can do:
val json: JsValue = ...
val jsonString: String = NoUnicodeEscPrettyPrinter( json )
jsonString
will contain your json document in pretty-print format and without any unicde escaping.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With