Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Scala collection's seq/par/view/force be seen as a violation of the uniform return type principle?

Most of the implementation complexity of the collection framework arises from the fact, that Scala can - unlike C#'s LINQ or other collection frameworks - return the "best" collection type for higher order functions:

val numbers = List(1,2,3,4,5)
numbers map (2*) // returns a List[Int] = List(2, 4, 6, 8)

val doubles = Array(1.0, 2.0, 3.0)
doubles filter (_ < 3) // returns Array[Double] = Array(1.0, 2.0)

Why does this principle not hold for methods like seq, par, view, force?

numbers.view.map(2*).force 
// returns Seq[Int] = List(2, 4, 6, 8)

numbers.seq 
// returns scala.collection.immutable.Seq[Int] = List(1, 2, 3, 4)

doubles.par.seq 
// returns scala.collection.mutable.ArraySeq[Double] = ArraySeq(1.0, 2.0, 3.0)

Is there a technical limitation which prevents it from working? Or is this by design/intent? Considering that LINQ is basically lazy, Scala's equivalent (view, force) isn't more type-safe (only when using the strict methods), right?

like image 936
soc Avatar asked Jun 17 '11 18:06

soc


2 Answers

It is possible to embed more type information into the parallel collection classes so that you get back the collection you've started from, that's true. This would mean that after turning a List into a ParVector by calling par (in O(n), because elements are copied into the vector) and then calling seq, you would again get a List. To obtain the list with seq, you would have to copy all the elements from the vector back into the list. What happens instead is:

  • ParVector gets converted back into a Vector when seq is called - it gets converted in O(1)
  • calling par again on this vector will give you a ParVector in O(1), since they both vector and the parallel vector share the same underlying data

Notice that a collection such as a list has to be restructured when turned into a parallel collection, otherwise the operations on it cannot be efficiently parallelized.

Thus, you don't have to repetitively pay for copying when calling par and seq - the conversions become much more efficient. Since the primary goal of parallel collections was to increase efficiency, this was deemed more important than the uniform return type principle.

like image 112
axel22 Avatar answered Nov 15 '22 19:11

axel22


Regarding the static return type of a collection method, C# supports overloading of LINQ extension methods (Select, Where etc.) and will automatically pick the most specific one in scope. So Seq.Select can return aSeq, Enumerable.Select can return an Enumerable etc. This is very similar to how the most specific implicit implementation is chosen in Scala.

Regarding the dynamic type, LINQ operations are implemented as extension methods so no dynamic dispatch is performed. So, (Seq as Enumerable).Select will return an Enumerable, not a Seq. In Scala, collection method invocations are dynamically dispatched, so the dynamic type will stay the same for map, filter etc. Both approaches have advantages/disadvantages. The only clean solution for this kind of problem is multimethods IMHO and neither language/runtime supports them efficiently.

As for runtime behaviour, LINQ always returns a lazily evaluated view of the collection. There is no method on the view which magically returns a collection of the original type, you manually have to specify that you want an array for example using the toArray method. IMHO, this is a cleaner and simpler approach than the one in Scala, and lazy collection operations compose better than strict ones, but this comes at the cost of an extra method call to get a strict collection for single collection operations.

like image 30
Jesper Nordenberg Avatar answered Nov 15 '22 18:11

Jesper Nordenberg