I'm running the following Scala code:
import scala.util.parsing.json._
import scala.io._
object Main {
def jsonStringMap(str: String) =
JSON.parseFull(str) match {
case Some(m: Map[_,_]) => m collect {
// If this doesn't match, we'll just ignore the value
case (k: String, v: String) => (k,v)
} toMap
case _ => Map[String,String]()
}
def main(args: Array[String]) {
val fh = Source.fromFile("listings.txt")
try {
fh.getLines map(jsonStringMap) foreach { v => println(v) }
} finally {
fh.close
}
}
}
On my machine it takes ~3 minutes on the file from http://sortable.com/blog/coding-challenge/. Equivalent Haskell and Ruby programs I wrote take under 4 seconds. What am I doing wrong?
I tried the same code without the map(jsonStringMap) and it was plenty fast, so is the JSON parser just really slow?
It does seem likely that the default JSON parser is just really slow, however I tried https://github.com/stevej/scala-json and while that gets it down to 35 seconds, that's still much slower than Ruby.
I am now using https://github.com/codahale/jerkson which is even faster! My program now runs in only 6 seconds on my data, only 3 seconds slower than Ruby, which is probably just the JVM starting up.
A quick look at the scala-user archive seems to indicate that nobody is doing serious work with the JSON parser in the scala standard library.
See http://groups.google.com/group/scala-user/msg/fba208f2d3c08936
It seems the parser ended up in the standard library at a time when scala was less in the spotlight and didn't have the expectations it has today.
Use Jerkson. Jerkson uses Jackson which is always the fastest JSON library on the JVM (especially when stream reading/writing) large documents.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With