Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this Scala code slow?

I'm running the following Scala code:

import scala.util.parsing.json._
import scala.io._

object Main {
        def jsonStringMap(str: String) =
                JSON.parseFull(str) match {
                        case Some(m: Map[_,_]) => m collect {
                                        // If this doesn't match, we'll just ignore the value
                                        case (k: String, v: String) => (k,v)
                                } toMap
                        case _ => Map[String,String]()
                }

        def main(args: Array[String]) {
                val fh = Source.fromFile("listings.txt")
                try {
                        fh.getLines map(jsonStringMap) foreach { v => println(v) }
                } finally {
                        fh.close
                }
        }
}

On my machine it takes ~3 minutes on the file from http://sortable.com/blog/coding-challenge/. Equivalent Haskell and Ruby programs I wrote take under 4 seconds. What am I doing wrong?

I tried the same code without the map(jsonStringMap) and it was plenty fast, so is the JSON parser just really slow?

It does seem likely that the default JSON parser is just really slow, however I tried https://github.com/stevej/scala-json and while that gets it down to 35 seconds, that's still much slower than Ruby.

I am now using https://github.com/codahale/jerkson which is even faster! My program now runs in only 6 seconds on my data, only 3 seconds slower than Ruby, which is probably just the JVM starting up.

like image 626
singpolyma Avatar asked Feb 23 '12 02:02

singpolyma


2 Answers

A quick look at the scala-user archive seems to indicate that nobody is doing serious work with the JSON parser in the scala standard library.

See http://groups.google.com/group/scala-user/msg/fba208f2d3c08936

It seems the parser ended up in the standard library at a time when scala was less in the spotlight and didn't have the expectations it has today.

like image 169
huynhjl Avatar answered Sep 21 '22 02:09

huynhjl


Use Jerkson. Jerkson uses Jackson which is always the fastest JSON library on the JVM (especially when stream reading/writing) large documents.

like image 37
Steve Avatar answered Sep 23 '22 02:09

Steve