Vector is a new collection type in Scala 2.8 that addresses the inefficiency for random access on lists. Vectors allow accessing any element of the list in "effectively" constant time. It's a larger constant than for access to the head of a list or for reading an element of an array, but it's a constant nonetheless.
For random access, vector is better. For head, tail access, list is better. For bulk operation, such as map, filter, vector is preferred since vector is organized with 32 elements as a chunk whereas list organized the elements with pointers to each other there is no guarantee these elements are close to each other.
Vector is an immutable data structure in Scala. It was introduced in Scala version 2.8 to address the inefficiency of random access of other existing data structures.
A Seq is an Iterable that has a defined order of elements. Sequences provide a method apply() for indexing, ranging from 0 up to the length of the sequence. Seq has many subclasses including Queue, Range, List, Stack, and LinkedList. A List is a Seq that is implemented as an immutable linked list.
As a general rule, default to using Vector
. It’s faster than List
for almost everything and more memory-efficient for larger-than-trivial sized sequences. See this documentation of the relative performance of Vector compared to the other collections. There are some downsides to going with Vector
. Specifically:
List
(though not by as much as you might think)Another downside before Scala 2.10 was that pattern matching support was better for List
, but this was rectified in 2.10 with generalized +:
and :+
extractors.
There is also a more abstract, algebraic way of approaching this question: what sort of sequence do you conceptually have? Also, what are you conceptually doing with it? If I see a function that returns an Option[A]
, I know that function has some holes in its domain (and is thus partial). We can apply this same logic to collections.
If I have a sequence of type List[A]
, I am effectively asserting two things. First, my algorithm (and data) is entirely stack-structured. Second, I am asserting that the only things I’m going to do with this collection are full, O(n) traversals. These two really go hand-in-hand. Conversely, if I have something of type Vector[A]
, the only thing I am asserting is that my data has a well defined order and a finite length. Thus, the assertions are weaker with Vector
, and this leads to its greater flexibility.
Well, a List
can be incredibly fast if the algorithm can be implemented solely with ::
, head
and tail
. I had an object lesson of that very recently, when I beat Java's split
by generating a List
instead of an Array
, and couldn't beat that with anything else.
However, List
has a fundamental problem: it doesn't work with parallel algorithms. I cannot split a List
into multiple segments, or concatenate it back, in an efficient manner.
There are other kinds of collections that can handle parallelism much better -- and Vector
is one of them. Vector
also has great locality -- which List
doesn't -- which can be a real plus for some algorithms.
So, all things considered, Vector
is the best choice unless you have specific considerations that make one of the other collections preferable -- for example, you might choose Stream
if you want lazy evaluation and caching (Iterator
is faster but doesn't cache), or List
if the algorithm is naturally implemented with the operations I mentioned.
By the way, it is preferable to use Seq
or IndexedSeq
unless you want a specific piece of API (such as List
's ::
), or even GenSeq
or GenIndexedSeq
if your algorithm can be run in parallel.
Some of the statements here are confusing or even wrong, especially the idea that immutable.Vector in Scala is anything like an ArrayList. List and Vector are both immutable, persistent (i.e. "cheap to get a modified copy") data structures. There is no reasonable default choice as their might be for mutable data structures, but it rather depends on what your algorithm is doing. List is a singly linked list, while Vector is a base-32 integer trie, i.e. it is a kind of search tree with nodes of degree 32. Using this structure, Vector can provide most common operations reasonably fast, i.e. in O(log_32(n)). That works for prepend, append, update, random access, decomposition in head/tail. Iteration in sequential order is linear. List on the other hand just provides linear iteration and constant time prepend, decomposition in head/tail. Everything else takes in general linear time.
This might look like as if Vector was a good replacement for List in almost all cases, but prepend, decomposition and iteration are often the crucial operations on sequences in a functional program, and the constants of these operations are (much) higher for vector due to its more complicated structure. I made a few measurements, so iteration is about twice as fast for list, prepend is about 100 times faster on lists, decomposition in head/tail is about 10 times faster on lists and generation from a traversable is about 2 times faster for vectors. (This is probably, because Vector can allocate arrays of 32 elements at once when you build it up using a builder instead of prepending or appending elements one by one). Of course all operations that take linear time on lists but effectively constant time on vectors (as random access or append) will be prohibitively slow on large lists.
So which data structure should we use? Basically, there are four common cases:
For immutable collections, if you want a sequence, your main decision is whether to use an IndexedSeq
or a LinearSeq
, which give different guarantees for performance. An IndexedSeq provides fast random-access of elements and a fast length operation. A LinearSeq provides fast access only to the first element via head
, but also has a fast tail
operation. (Taken from the Seq documentation.)
For an IndexedSeq
you would normally choose a Vector
. Range
s and WrappedString
s are also IndexedSeqs.
For a LinearSeq
you would normally choose a List
or its lazy equivalent Stream
. Other examples are Queue
s and Stack
s.
So in Java terms, ArrayList
used similarly to Scala's Vector
, and LinkedList
similarly to Scala's List
. But in Scala I would tend to use List more often than Vector, because Scala has much better support for functions that include traversal of the sequence, like mapping, folding, iterating etc. You will tend to use these functions to manipulate the list as a whole, rather than randomly accessing individual elements.
In situations which involve a lot random access and random mutation, a Vector
(or – as the docs say – a Seq
) seems to be a good compromise. This is also what the performance characteristics suggest.
Also, the Vector
class seems to play nicely in distributed environments without much data duplication because there is no need to do a copy-on-write for the complete object. (See: http://akka.io/docs/akka/1.1.3/scala/stm.html#persistent-datastructures)
If you're programming immutably and need random access, Seq is the way to go (unless you want a Set, which you often actually do). Otherwise List works well, except it's operations can't be parallelized.
If you don't need immutable data structures, stick with ArrayBuffer since it's the Scala equivalent to ArrayList.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With