Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Scala's Vector work?

I read this page about the time complexity of Scala collections. As it says, Vector's complexity is eC for all operations.

It made me wonder what Vector is. I read the document and it says:

Because vectors strike a good balance between fast random selections and fast random functional updates, they are currently the default implementation of immutable indexed sequences. It is backed by a little endian bit-mapped vector trie with a branching factor of 32. Locality is very good, but not contiguous, which is good for very large sequences.

As with everything else about Scala, it's pretty vague. How actually does Vector work?

like image 450
Lai Yu-Hsuan Avatar asked Dec 16 '13 13:12

Lai Yu-Hsuan


People also ask

How is Scala Vector implemented?

A Scala Vector is implemented as a balanced tree with values only at the leaves. Except instead of being binary, as is commonly seen, the trees used to implement a Vector are 32-ary: each node in the tree contains 32 children instead of just 2.

What is Scala Vector?

Vector is a new collection type in Scala 2.8 that addresses the inefficiency for random access on lists. Vectors allow accessing any element of the list in "effectively" constant time. It's a larger constant than for access to the head of a list or for reading an element of an array, but it's a constant nonetheless.


1 Answers

The keyword here is Trie. Vector is implemented as a Trie datastructure. See http://en.wikipedia.org/wiki/Trie.

More precisely, it is a "bit-mapped vector trie". I've just found a consise enough description of the structure (along with an implementation - apparently in Rust) here:

https://bitbucket.org/astrieanna/bitmapped-vector-trie

The most relevant excerpt is:

A Bitmapped Vector Trie is basically a 32-tree. Level 1 is an array of size 32, of whatever data type. Level 2 is an array of 32 Level 1's. and so on, until: Level 7 is an array of 2 Level 6's.

UPDATE: In reply to Lai Yu-Hsuan's comment about complexity:

I will have to assume you meant "depth" here :-D. The legend for "eC" says "The operation takes effectively constant time, but this might depend on some assumptions such as maximum length of a vector or distribution of hash keys.".

If you are willing to consider the worst case, and given that there is an upper bound to the maximum size of the vector, then yes indeed we can say that the complexity is constant. Say that we consider the maximum size to be 2^32, then this means that the worst case is 7 operations at most, in any case. Then again, we can always consider the worst case for any type of collection, find an upper bound and say this is constant complexity, but for a list by example, this would mean a constant of 4 billions, which is not quite practical.

But Vector is the opposite, 7 operations being more than practical, and this is how we can afford to consider its complexity constant in practice.

Another way to look at this: we are not talking about log(2,N), but log(32,N). If you try to plot that you'll see it is practically an horizontal line. So pragmatically speaking you'll never be able to see much increase in processing time as the collection grows. Yes, that's still not really constant (which is why it is marked as "eC" and not just "C"), and you'll be able to see a difference around short vectors (but again, a very small difference because the number of operations grows so much slowly).

like image 188
Régis Jean-Gilles Avatar answered Sep 21 '22 04:09

Régis Jean-Gilles