Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiency of appending to vectors

Appending an element onto a million-element ArrayList has the cost of setting one reference now, and copying one reference in the future when the ArrayList must be resized.

As I understand it, appending an element onto a million-element PersistenVector must create a new path, which consists of 4 arrays of size 32. Which means more than 120 references have to be touched.

How does Clojure manage to keep the vector overhead to "2.5 times worse" or "4 times worse" (as opposed to "60 times worse"), which has been claimed in several Clojure videos I have seen recently? Has it something to do with caching or locality of reference or something I am not aware of?

Or is it somehow possible to build a vector internally with mutation and then turn it immutable before revealing it to the outside world?

I have tagged the question scala as well, since scala.collection.immutable.vector is basically the same thing, right?

like image 617
fredoverflow Avatar asked Jun 28 '14 16:06

fredoverflow


2 Answers

Clojure's PersistentVector's have special tail buffer to enable efficient operation at the end of the vector. Only after this 32-element array is filled is it added to the rest of the tree. This keeps the amortized cost low. Here is one article on the implementation. The source is also worth a read.

Regarding, "is it somehow possible to build a vector internally with mutation and then turn it immutable before revealing it to the outside world?", yes! These are known as transients in Clojure, and are used for efficient batch changes.

like image 142
A. Webb Avatar answered Nov 15 '22 16:11

A. Webb


Cannot tell about Clojure, but I can give some comments about Scala Vectors.

Persistent Scala vectors (scala.collection.immutable.Vectors) are much slower than an array buffer when it comes to appending. In fact, they are 10x slower than the List prepend operation. They are 2x slower than appending to Conc-trees, which we use in Parallel Collections.

But, Scala also has mutable vectors -- they're hidden in the class VectorBuilder. Appending to mutable vectors does not preserve the previous version of the vector, but mutates it in place by keeping the pointer to the rightmost leaf in the vector. So, yes -- keeping the vector mutable internally, and than returning an immutable reference is exactly what's done in Scala collections.

The VectorBuilder is slightly faster than the ArrayBuffer, because it needs to allocate its arrays only once, whereas ArrayBuffer needs to do it twice on average (because of growing). Conc.Buffers, which we use as parallel array combiners, are twice as fast compared to VectorBuilders.

Benchmarks are here. None of the benchmarks involve any boxing, they work with reference objects to avoid any bias:

  • comparison of Scala List, Vector and Conc
  • comparison of Scala ArrayBuffer, VectorBuilder and Conc.Buffer

More collections benchmarks here.

These tests were executed using ScalaMeter.

like image 7
axel22 Avatar answered Nov 15 '22 15:11

axel22