Most of the implementation complexity of the collection framework arises from the fact, that Scala can - unlike C#'s LINQ or other collection frameworks - return the "best" collection type for higher order functions:
val numbers = List(1,2,3,4,5)
numbers map (2*) // returns a List[Int] = List(2, 4, 6, 8)
val doubles = Array(1.0, 2.0, 3.0)
doubles filter (_ < 3) // returns Array[Double] = Array(1.0, 2.0)
Why does this principle not hold for methods like seq
, par
, view
, force
?
numbers.view.map(2*).force
// returns Seq[Int] = List(2, 4, 6, 8)
numbers.seq
// returns scala.collection.immutable.Seq[Int] = List(1, 2, 3, 4)
doubles.par.seq
// returns scala.collection.mutable.ArraySeq[Double] = ArraySeq(1.0, 2.0, 3.0)
Is there a technical limitation which prevents it from working?
Or is this by design/intent?
Considering that LINQ is basically lazy, Scala's equivalent (view
, force
) isn't more type-safe (only when using the strict methods), right?
It is possible to embed more type information into the parallel collection classes so that you get back the collection you've started from, that's true. This would mean that after turning a List
into a ParVector
by calling par
(in O(n), because elements are copied into the vector) and then calling seq
, you would again get a List
. To obtain the list with seq
, you would have to copy all the elements from the vector back into the list. What happens instead is:
ParVector
gets converted back into a Vector
when seq
is called - it gets converted in O(1)par
again on this vector will give you a ParVector
in O(1), since they both vector and the parallel vector share the same underlying dataNotice that a collection such as a list has to be restructured when turned into a parallel collection, otherwise the operations on it cannot be efficiently parallelized.
Thus, you don't have to repetitively pay for copying when calling par
and seq
- the conversions become much more efficient. Since the primary goal of parallel collections was to increase efficiency, this was deemed more important than the uniform return type principle.
Regarding the static return type of a collection method, C# supports overloading of LINQ extension methods (Select
, Where
etc.) and will automatically pick the most specific one in scope. So Seq.Select
can return aSeq
, Enumerable.Select
can return an Enumerable
etc. This is very similar to how the most specific implicit implementation is chosen in Scala.
Regarding the dynamic type, LINQ operations are implemented as extension methods so no dynamic dispatch is performed. So, (Seq as Enumerable).Select
will return an Enumerable
, not a Seq
. In Scala, collection method invocations are dynamically dispatched, so the dynamic type will stay the same for map
, filter
etc. Both approaches have advantages/disadvantages. The only clean solution for this kind of problem is multimethods IMHO and neither language/runtime supports them efficiently.
As for runtime behaviour, LINQ always returns a lazily evaluated view of the collection. There is no method on the view which magically returns a collection of the original type, you manually have to specify that you want an array for example using the toArray
method. IMHO, this is a cleaner and simpler approach than the one in Scala, and lazy collection operations compose better than strict ones, but this comes at the cost of an extra method call to get a strict collection for single collection operations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With