Is there somewhere I can find out the expected time and space complexities of operations on collections like HashSet, TreeSet, List and so on?
Is one just expected to know these from the properties of the abstract-data-types themselves?
I know of Performance characteristics for Scala collections, but this only mentions some very basic operations. Perhaps the rest of the operations for these collections are built purely from a small base-set, but then, it seems I am just expected to know that they have implemented them in this way?
The guide for the other methods should be - just think what an efficient implementation should look like.
Most other bulk-operations on collections (operations that process each element in the collection) are O(n)
, so they are not mentioned there. Examples are filter
, map
, foreach
, indexOf
, reverse
, find
...
Methods returning iterators or streams like combinations
and permutations
are usually O(1)
.
Methods involving 2 collections are usually O(max(n, m))
or O(min(n, m))
. These are zip
, zipAll
, sameElements
, corresponds
, ...
Methods union
, diff
, and intersect
are O(n + m)
.
Sort variants are, naturally, O(nlogn)
. The groupBy
is O(nlogn)
in the current implementation. The indexOfSlice
uses the KMP algorithm and is O(m + n)
, where m
and n
are lengths of the strings.
Methods such as +:
, :+
or patch
are generally O(n)
as well, unless you are dealing with a specific case of an immutable collection for which the operation in question is more efficient - for example, prepending an element on a functional List
or appending an element to a Vector
.
Methods toX
are generally O(n)
, as they have to iterate all the elements and create a new collection. An exception is toStream
which builds the collection lazily - so it's O(1)
. Also, whenever X
is the type of the collection toX
just returns this
, being O(1)
.
Iterator implementations should have an O(1)
(amortized) next
and hasNext
operations. Iterator creation should be worst-case O(logn)
, but O(1)
in most cases.
Performance characteristics of the other methods is really difficult to assert. Consider the following:
foreach
or iterator
, and at usually very high levels in the hierachy. Vector
's map
is implemented on collection.TraversableLike
, for example.
To add insult to injury, which method implementation is used depends on the linearization of the class inheritance. This also applies to any method called as a helper. It has happened before that changes here caused unforeseen performance problems.
Since foreach
and iterator
are both O(n)
, any improved performance depends on specialization at other methods, such as size
and slice
.So the result is that the place where the method is defined -- and documented -- does not have near enough information to state its performance characteristics, and may depend not only on how other methods are implemented by the inheriting collection, but even by the performance characteristics of an object, Builder, obtained from CanBuildFrom, that is passed at the call site.
At best, any such documentation would be described in terms of other methods. Which doesn't mean it isn't worthwhile, but it isn't easily done -- and hard tasks on open source projects depend on volunteers, who usually work at what they like, not what is needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With