Why is Circe Json slower with implicit decoder lookup compared to saving the implicit decoder to a val.
I would expect these to be the same because implicit resolution is done at runtime.
import io.circe._
import io.circe.generic.auto._
import io.circe.jackson
import io.circe.syntax._
private val decoder = implicitly[Decoder[Data.Type]]
def decode(): Either[Error, Type] = {
jackson.decode[Data.Type](Data.json)(decoder)
}
def decodeAuto(): Either[Error, Type] = {
jackson.decode[Data.Type](Data.json)
}
[info] DecodeTest.circeJackson thrpt 200 69157.472 ± 283.285 ops/s
[info] DecodeTest.circeJacksonAuto thrpt 200 67946.734 ± 315.876 ops/s
The full repo can be found here. https://github.com/stephennancekivell/some-jmh-json-benchmarks-circe-jackson
Consider this much simpler case that doesn't involve circe or generic derivation at all:
package demo
import org.openjdk.jmh.annotations._
@State(Scope.Thread)
@BenchmarkMode(Array(Mode.Throughput))
class OrderingBench {
val items: List[(Char, Int)] = List('z', 'y', 'x').zipWithIndex
val tupleOrdering: Ordering[(Char, Int)] = implicitly
@Benchmark
def sortWithResolved(): List[(Char, Int)] = items.sorted
@Benchmark
def sortWithVal(): List[(Char, Int)] = items.sorted(tupleOrdering)
}
On 2.11 on my desktop machine I get this:
Benchmark Mode Cnt Score Error Units
OrderingBench.sortWithResolved thrpt 40 15940745.279 ± 102634.860 ps/s
OrderingBench.sortWithVal thrpt 40 16420078.932 ± 102901.418 ops/s
And if you look at allocations the difference is a little bigger:
Benchmark Mode Cnt Score Error Units
OrderingBench.sortWithResolved:gc.alloc.rate.norm thrpt 20 176.000 ± 0.001 B/op
OrderingBench.sortWithVal:gc.alloc.rate.norm thrpt 20 152.000 ± 0.001 B/op
You can tell what's going on by breaking out reify
:
scala> val items: List[(Char, Int)] = List('z', 'y', 'x').zipWithIndex
items: List[(Char, Int)] = List((z,0), (y,1), (x,2))
scala> import scala.reflect.runtime.universe._
import scala.reflect.runtime.universe._
scala> showCode(reify(items.sorted).tree)
res0: String = $read.items.sorted(Ordering.Tuple2(Ordering.Char, Ordering.Int))
The Ordering.Tuple2
here is a generic method that instantiates an Ordering[(Char, Int)]
. This is exactly the same thing that happens when we define our tupleOrdering
, but the difference is that in the val
case it happens once, while in the case where it's resolved implicitly it happens every time sorted
is called.
So the difference you're seeing is just the cost of instantiating the Decoder
instance in every operation, as opposed to instantiating it a single time at the beginning outside of the benchmarked code. This cost is relatively tiny, and for larger benchmarks it's going to be more difficult to see.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With