Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to serialize JSON with json4s with UTF-8 characters?

I have a really simple example:

import org.json4s._
import org.json4s.native.JsonMethods._
import org.json4s.JsonDSL._

val json = ("english" -> JString("serialization")) ~ ("japanese" -> JString("シリアライゼーション"))

println(pretty(render(json)))

What I get out of that is:

{
  "english":"serialization",
  "japanese":"\u30b7\u30ea\u30a2\u30e9\u30a4\u30bc\u30fc\u30b7\u30e7\u30f3"
}

What I want is this (perfectly valid AFAIK) JSON:

{
  "english":"serialization",
  "japanese":"シリアライゼーション"
}

I can't find it now, but I think I've read somewhere that JSON only requires two special UTF-8 characters to be escaped.

Looking at the code for render, it appears that Strings always get this extra double-escaping for non-ASCII characters.

Anyone know how I can get valid JSON without double-escaping all the UTF-8 extended characters? This seems like a very similar issue to: Why does the PHP json_encode function convert UTF-8 strings to hexadecimal entities?


Update: It turns out this is an open issue in json4s with a pending PR #327 which was closed in favor of PR #339 which in turn merged into the 3.4 release branch in a commit on Feb 13, 2016.

like image 327
user107057 Avatar asked Feb 03 '16 04:02

user107057


1 Answers

@0__, it is not clear what answer you want to get with your bounty. The bug mentioned in the original question has already been fixed, so you can customize whether you want Unicode characters to be encoded or not. You just need to build with a current version, e.g. with a build.sbt like this:

name := "SO_ScalaJson4sUnicodeChars"
version := "1.0"
scalaVersion := "2.12.1"
libraryDependencies += "org.json4s" %% "json4s-native" % "3.5.1"

As @kriegaex mentioned in his comment, UTF-8 is the default encoding for JSON according to RFC 7159, so encoding is not strictly necessary. This is why by default json4s does not encode, just as the OP requested:

package so

import org.json4s.JsonDSL._
import org.json4s._
import org.json4s.native.JsonMethods._

object SOTest extends App {
  val json = ("english" -> JString("serialization")) ~ ("japanese" -> JString("シリアライゼーション"))
  println(pretty(render(json)))
}

Console log:

{
  "english":"serialization",
  "japanese":"シリアライゼーション"
}

However if for some compatibility reason you need the output to be encdeded, json4s supports that as well. If you add your own customJsonFormats like this, you get encoded output:

package so

import org.json4s.JsonDSL._
import org.json4s._
import org.json4s.native.JsonMethods._

object SOTest extends App {
  val json = ("english" -> JString("serialization")) ~ ("japanese" -> JString("シリアライゼーション"))
  implicit val customJsonFormats = new DefaultFormats {
    override def alwaysEscapeUnicode: Boolean = true
  }
  println(pretty(render(json)))
}

Console log:

{
  "english":"serialization",
  "japanese":"\u30b7\u30ea\u30a2\u30e9\u30a4\u30bc\u30fc\u30b7\u30e7\u30f3"
}

Update by @kriegaex: I decided to edit this answer, merging in some information from my own one and fixing a few minor issues. I did this so as to avoid redundancy. I am more interested in a good, consistent answer than in the bounty. I am going to delete mine now.

like image 110
SergGr Avatar answered Oct 22 '22 21:10

SergGr