Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Asymptotic behaviour of Scala methods

Is there somewhere I can find out the expected time and space complexities of operations on collections like HashSet, TreeSet, List and so on?

Is one just expected to know these from the properties of the abstract-data-types themselves?

I know of Performance characteristics for Scala collections, but this only mentions some very basic operations. Perhaps the rest of the operations for these collections are built purely from a small base-set, but then, it seems I am just expected to know that they have implemented them in this way?

like image 290
MGwynne Avatar asked Oct 19 '11 10:10

MGwynne


2 Answers

The guide for the other methods should be - just think what an efficient implementation should look like.

Most other bulk-operations on collections (operations that process each element in the collection) are O(n), so they are not mentioned there. Examples are filter, map, foreach, indexOf, reverse, find ...

Methods returning iterators or streams like combinations and permutations are usually O(1).

Methods involving 2 collections are usually O(max(n, m)) or O(min(n, m)). These are zip, zipAll, sameElements, corresponds, ...

Methods union, diff, and intersect are O(n + m).

Sort variants are, naturally, O(nlogn). The groupBy is O(nlogn) in the current implementation. The indexOfSlice uses the KMP algorithm and is O(m + n), where m and n are lengths of the strings.

Methods such as +:, :+ or patch are generally O(n) as well, unless you are dealing with a specific case of an immutable collection for which the operation in question is more efficient - for example, prepending an element on a functional List or appending an element to a Vector.

Methods toX are generally O(n), as they have to iterate all the elements and create a new collection. An exception is toStream which builds the collection lazily - so it's O(1). Also, whenever X is the type of the collection toX just returns this, being O(1).

Iterator implementations should have an O(1) (amortized) next and hasNext operations. Iterator creation should be worst-case O(logn), but O(1) in most cases.

like image 108
axel22 Avatar answered Nov 15 '22 06:11

axel22


Performance characteristics of the other methods is really difficult to assert. Consider the following:

  • These methods are all implemented based on foreach or iterator, and at usually very high levels in the hierachy. Vector's map is implemented on collection.TraversableLike, for example. To add insult to injury, which method implementation is used depends on the linearization of the class inheritance. This also applies to any method called as a helper. It has happened before that changes here caused unforeseen performance problems. Since foreach and iterator are both O(n), any improved performance depends on specialization at other methods, such as size and slice.
  • For many of them, there's further dependency on the performance characteristics of the builder that was provided, which depends on the call site instead of the definition site.

So the result is that the place where the method is defined -- and documented -- does not have near enough information to state its performance characteristics, and may depend not only on how other methods are implemented by the inheriting collection, but even by the performance characteristics of an object, Builder, obtained from CanBuildFrom, that is passed at the call site.

At best, any such documentation would be described in terms of other methods. Which doesn't mean it isn't worthwhile, but it isn't easily done -- and hard tasks on open source projects depend on volunteers, who usually work at what they like, not what is needed.

like image 38
Daniel C. Sobral Avatar answered Nov 15 '22 08:11

Daniel C. Sobral