Apologies if this is a duplicate - I did a few searches and didn't quite find what I need.
We have a performance critical piece of our application that converts a Play 2.0 Enumerator
(can be thought of as a Stream
) of incoming data to a List
(or similar). We will use the fold
method on Enumerator
and the question is what will be the most performant way to do it. (I will use Stream
instead of Enumerator
in the code, but the idea should be the same.)
val incoming: Stream[Int] = ???
val result: Seq[Int] = incoming.fold(Seq.empty)(_ + _)
val result2: Seq[Int] = incoming.fold(MutableList.empty(_ += _).toSeq
So the question is essentially, how does repeatedly appending to an immutable Vector
compare to repeatedly appending to a mutable MutableList
or ListBuffer
in performance critical code? We've thrown out just List
because we need O(1)
appending (not prepending). But does the mutable data-structure buy us anything in terms of performance or garbage collection?
You are probably best off using ArrayBuffer
. On my machine, you get about the following number of appends per second:
preallocated Array[Int] -- 830M
resized (x2) Array[Int] -- 263M
Vector.newBuilder + result -- 185M
mutable.ArrayBuffer -- 125M
mutable.ListBuffer -- 100M
mutable.MutableList -- 71M
immutable.List + reverse -- 68M
immutable.Vector -- 8M
I assume you're not always just storing ints, and you want all the collections goodness without extra wrappings, so ArrayBuffer
is the best-performing solution as long as you only need to append to one end. The lists support bidirectional addition and are comparable. Vector is horribly slow in comparison--only use it if you can take advantage of a lot of data sharing, or create it all in one swoop (see Vector.newBuilder
result, which is fantastic; it's a great data structure for access, iteration, and creation and sparing updates, not updates-all-the-time).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With