Appending an element onto a million-element ArrayList
has the cost of setting one reference now, and copying one reference in the future when the ArrayList
must be resized.
As I understand it, appending an element onto a million-element PersistenVector
must create a new path, which consists of 4 arrays of size 32. Which means more than 120 references have to be touched.
How does Clojure manage to keep the vector overhead to "2.5 times worse" or "4 times worse" (as opposed to "60 times worse"), which has been claimed in several Clojure videos I have seen recently? Has it something to do with caching or locality of reference or something I am not aware of?
Or is it somehow possible to build a vector internally with mutation and then turn it immutable before revealing it to the outside world?
I have tagged the question scala as well, since scala.collection.immutable.vector
is basically the same thing, right?
Clojure's PersistentVector's have special tail buffer to enable efficient operation at the end of the vector. Only after this 32-element array is filled is it added to the rest of the tree. This keeps the amortized cost low. Here is one article on the implementation. The source is also worth a read.
Regarding, "is it somehow possible to build a vector internally with mutation and then turn it immutable before revealing it to the outside world?", yes! These are known as transients in Clojure, and are used for efficient batch changes.
Cannot tell about Clojure, but I can give some comments about Scala Vectors.
Persistent Scala vectors (scala.collection.immutable.Vector
s) are much slower than an array buffer when it comes to appending. In fact, they are 10x slower than the List
prepend operation. They are 2x slower than appending to Conc-trees, which we use in Parallel Collections.
But, Scala also has mutable vectors -- they're hidden in the class VectorBuilder
. Appending to mutable vectors does not preserve the previous version of the vector, but mutates it in place by keeping the pointer to the rightmost leaf in the vector. So, yes -- keeping the vector mutable internally, and than returning an immutable reference is exactly what's done in Scala collections.
The VectorBuilder
is slightly faster than the ArrayBuffer
, because it needs to allocate its arrays only once, whereas ArrayBuffer
needs to do it twice on average (because of growing). Conc.Buffer
s, which we use as parallel array combiners, are twice as fast compared to VectorBuilder
s.
Benchmarks are here. None of the benchmarks involve any boxing, they work with reference objects to avoid any bias:
List
, Vector
and Conc
ArrayBuffer
, VectorBuilder
and Conc.Buffer
More collections benchmarks here.
These tests were executed using ScalaMeter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With