Understanding GenericTraversableTemplate and other Scala collection internals

Question

I was exchanging emails with an acquaintance that is a big Kotlin, Clojure and Java8 fan and asked him why not Scala. He provided many reasons (Scala is too academic, too many features, not the first time I hear this and I think this is very subjective) but his biggest pain point was as an example, that he doesn't like a language where he can't understand the implementation of basic data structures, and he gave LinkedList as an example.

I took a look at scala.collection.LinkedList and counted the things I either understand or somewhat understand.

CanBuildFrom - after some effort, I get it, type classes, not the longest suicide note in history [1]
LinkedListLike - I can't remember where I read it, but I got convinced this is there for a good reason

But then I started to stare at these

GenericTraversableTemplate - now I'm scratching my head as well...
SeqFactory, GenericCompanion - OK, now you lost me, I start to understand his point

Can someone who understand this well please explain GenericTraversableTemplate SeqFactory and GenericCompanion in the context of LinkedList? What they are for, what impact on the end user they have (e.g. I'm sure they are there for a good reason, what is that reason?)

Are they there for a practical reason? or is it a level of abstraction that could have been simplified?

I like Scala collections because I don't have to understand the internals to be able to effectively use them. I don't mind a complex implementation if it helps me to keep my usage simpler. e.g. I don't mind paying the price of a complex library if I get the ability to write cleaner more elegant code using it in return. but it will sure be nice to better understand it.

[1] - Is the Scala 2.8 collections library a case of "the longest suicide note in history"?

Andrey Tyukin · Accepted Answer

I will try to describe the concepts from the point of view of a random pedestrian (I've never contributed a single line to the Scala collection library, so don't hit me too hard if I'm wrong).

Since LinkedList is now deprecated, and because Maps provide a better example, I will use TreeMap as example.

CanBuildFrom

The motivation is this: If we take a TreeMap[Int, Int] and map it with

case (x, y) => (2 * x, y * y * 0.3d)

we get TreeMap[Int, Double]. This type safety alone would already explain the necessity for simple genericBuilder[X] constructs. However, if we map it with

case (x, y) => x

we obtain an Iterable[Int] (more precisely: a List[Int]), this is no longer a Map, the type of the container has changed. This is where CBF's come into play:

CanBuildFrom[This, X, That]

can be seen as a kind of "type-level function" that tells us: if we map a collection of type This with a function that returns values of type X, we can build a That. The most specific CBF is provided at compile time, in the first case it will be something like

CanBuildFrom[TreeMap[_,_], (X,Y), TreeMap[X,Y]]

in the second case it will be something like

CanBuildFrom[TreeMap[_,_], X, Iterable[X]]

and so we always get the right type of the container. The pattern is pretty general. Every time you have a generic function

foo[X1, ..., Xn](x1: X1, ..., xn: Xn): Y

where the result type Y depends on X1, ..., Xn, you can introduce an implicit parameter as follows:

foo[X1, ...., Xn, Y](x1: X1, ..., xn: Xn)(implicit CanFooFrom[X1, ..., Xn, Y]): Y

and then define the type-level function X1, ..., Xn -> Y piecewise by providing multiple implicit CanFooFrom's.

LinkedListLike

In the class definition, we see something like this:

TreeMap[A, B] extends SortedMap[A, B] with SortedMapLike[A, B, TreeMap[A, B]]

This is Scala's way to express the so-called F-bounded polymorphism. The motivation is as follows: Suppose we have a dozen (or at least two...) implementations of the trait SortedMap[A, B]. Now we want to implement a method withoutHead, it could look somewhat like this:

def withoutHead = this.remove(this.head)

If we move the implementation into SortedMap[A, B] itself, the best we can do is this:

def withoutHead: SortedMap[A, B] = this.remove(this.head)

But is this the most specific result type we can get? No, that's too vague. We would like to return TreeMap[A, B] if the original map is a TreeMap, and CrazySortedLinkedHashMap (or whatever...) if the original was a CrazySortedLinkedHashMap. This is why we move the implementation into SortedMapLike, and give the following signature to the withoutHead method:

trait SortedMapLike[A, B, Repr <: SortedMap[A, B]] {
  ...
  def withoutHead: Repr = this.remove(this.head)
}

now because TreeMap[A, B] extends SortedMapLike[A, B, TreeMap[A, B]], the result type of withoutHead is TreeMap[A,B]. The same holds for CrazySortedLinkedHashMap: we get the exact type back. In Java, you would either have to return SortedMap[A, B] or override the method in each subclass (which turned out to be a maintenance nightmare for the feature-rich traits in Scala)

GenericTraversableTemplate

The type is: GenericTraversableTemplate[+A, +CC[X] <: GenTraversable[X]]

As far as i can tell, this is just a trait that provides implementations of methods that somehow return regular collections with same container type but possibly different content type (stuff like flatten, transpose, unzip).

Stuff like foldLeft, reduce, exists are not here because these methods care only about content type, not container type.

Stuff like flatMap is not here, because the container type can change (again, CBF's).

Why is it a separate trait, is there a fundamental reason why it exists? I don't think so... It probably would be possible to group the godzillion of methods somewhat differently. But this is just what happens naturally: you start to implement a trait, and it turns out that it has very many methods. So instead you group loosely related methods, and put them into 10 different traits with awkward names like "GenTraversableTemplate", and them mix them all into traits/classes where you need them...

GenericCompanion

This is just an abstract class that implements some basic functionality which is common for companion objects of most collection classes (essentially, it just implements very simple factory methods apply(varargs) and empty).

For example there is method apply that takes varargs of some type A and returns a collection of type CC[A]:

Array(1, 2, 3, 4) // calls Array.apply[A](elems: A*) on the companion object
List(1, 2, 3, 4) // same for List

The implementation is very simple, it's something like this:

def apply[A](varargs: A*): CC[A] = {
  val builder = newBuilder[A]
  for (arg <- varargs) builder += arg
  builder.result()
}

This is obviously the same for Arrays and Lists and TreeMaps and almost everything else, except 'constrained irregular Collections' like Bitset. So this is just common functionality in a common ancestor class of most companion objects. Nothing special about that.

SeqFactory

Similar to GenericCompanion, but this time more specifically for Sequences. Adds some common factory methods like fill() and iterate() and tabulate() etc. Again, nothing particularly rocket-scientific here...

Few general remarks

In general: I don't think that one should attempt to understand every single trait in this library. Rather, one should try to look at the library as a whole. As a whole, it has a very interesting architecture. And in my personal opinion, it's actually a very aesthetic piece of software, but one has to stare at it for quite a while (and try to re-implement the whole architectural pattern several times) to grasp it. On the other hand: for example CBF's are kind of "design pattern" that clearly should be eliminated in successors of this language. The whole story with the scope of implicit CBF's still seems like a total nightmare to me. But many things seemed completely inscrutable at first, and almost always, it ended with an epiphany (which is very specific for Scala: for the majority of other languages, such struggles usually end with the thought "Author of this is a complete idiot").

Understanding GenericTraversableTemplate and other Scala collection internals

Tags:

scala

Eran Medan

1 Answers

Andrey Tyukin

Recent Activity

Donate For Us

Understanding GenericTraversableTemplate and other Scala collection internals

Tags:

scala

Eran Medan

1 Answers

Andrey Tyukin

Related questions

Recent Activity

Donate For Us