Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deforestation in Scala collections

From the design of Scala's collections I understand that something like:

scala> BitSet(1,2,3) map (_ + "a")
res7: scala.collection.immutable.Set[String] = Set(1a, 2a, 3a)

doesn't build an intermediate datastructure: the new Set is built as the BitSet is iterated over using a Builder. In fact in that case it is obvious since a bitset of strings doesn't make sense.

What about maps from lists? I am pretty sure that the following builds an intermediate list:

scala> List(1,2,3) map (_ -> "foo") toMap
res8: scala.collection.immutable.Map[Int,java.lang.String] =
    Map(1 -> foo, 2 -> foo, 3 -> foo)

namely the list List((1,foo), (2,foo), (3,foo)). If not, then how? Now, what about the following?

scala> Map.empty ++ (List(1,2,3) map (_ -> "foo"))
res10: scala.collection.immutable.Map[Int,java.lang.String] =
    Map(1 -> foo, 2 -> foo, 3 -> foo)

This time, from what I seem to understand from the type of ++:

def ++ [B >: (A, B), That]
       (that: TraversableOnce[B])
       (implicit bf: CanBuildFrom[Map[A, B], B, That]): That

I think it might be the case that the map is built on the fly, and that no intermediate list is constructed.

Is it the case? If yes, is this the canonical way of ensuring deforestation or is there a more straightforward syntax?

like image 529
Paul Brauner Avatar asked Nov 21 '11 03:11

Paul Brauner


2 Answers

You can use breakOut to ensure that no intermediate collection is created. For example:

// creates intermediate list.
scala> List((3, 4), (9, 11)).map(_.swap).toMap 
res542: scala.collection.immutable.Map[Int,Int] = Map(4 -> 3, 11 -> 9)

scala> import collection.breakOut
import collection.breakOut

// doesn't create an intermediate list.
scala> List((3, 4), (9, 11)).map(_.swap)(breakOut) : Map[Int, Int]
res543: Map[Int,Int] = Map(4 -> 3, 11 -> 9)

You can read more about it here.

UPDATE:

If you read the definition of breakOut, you'll notice that it's basically a way of creating a CanBuildFrom object of expected type and passing it explicitly to the method. breakOut merely saves you from typing the following boilerplate.

// Observe the error message. This will tell you the type of argument expected.
scala> List((3, 4), (9, 11)).map(_.swap)('dummy)
<console>:16: error: type mismatch;
 found   : Symbol
 required: scala.collection.generic.CanBuildFrom[List[(Int, Int)],(Int, Int),?]
              List((3, 4), (9, 11)).map(_.swap)('dummy)
                                                ^

// Let's try passing the implicit with required type.
// (implicitly[T] simply picks up an implicit object of type T from scope.)
scala> List((3, 4), (9, 11)).map(_.swap)(implicitly[CanBuildFrom[List[(Int, Int)], (Int, Int), Map[Int, Int]]])
// Oops! It seems the implicit with required type doesn't exist.
<console>:16: error: Cannot construct a collection of type Map[Int,Int] with elements of type (Int, Int) based on a coll
ection of type List[(Int, Int)].
              List((3, 4), (9, 11)).map(_.swap)(implicitly[CanBuildFrom[List[(Int, Int)], (Int, Int), Map[Int, Int]]])

// Let's create an object of the required type ...
scala> object Bob extends CanBuildFrom[List[(Int, Int)], (Int, Int), Map[Int, Int]] {
     |   def apply(from: List[(Int, Int)]) = foo.apply
     |   def apply() = foo.apply
     |   private def foo = implicitly[CanBuildFrom[Nothing, (Int, Int), Map[Int, Int]]]
     | }
defined module Bob

// ... and pass it explicitly.
scala> List((3, 4), (9, 11)).map(_.swap)(Bob)
res12: Map[Int,Int] = Map(4 -> 3, 11 -> 9)

// Or let's just have breakOut do all the hard work for us.
scala> List((3, 4), (9, 11)).map(_.swap)(breakOut) : Map[Int, Int]
res13: Map[Int,Int] = Map(4 -> 3, 11 -> 9)
like image 189
missingfaktor Avatar answered Nov 03 '22 14:11

missingfaktor


Example 1) Correct, there's no intermediate List

2) Yes, you get an itermediate List.

3) Again yes, you get an intermeditate List from what you have in the parentheses. There's no "magic" going on. If you have something in parentheses, it gets evaluated first.

I'm not sure what you mean by "deforestation" here: according to Wikipedia it means eliminating tree structures. If you mean eliminating intermediate lists, you should use a view. See for example here: summing a transformation of a list of numbers in scala

So without intermediate results, your examples would be

BitSet(1,2,3).view.map(_ + "a").toSet

(toSet is required because otherwise you have an IterableView[String,Iterable[_]])

List(1,2,3).view.map(_ -> "foo").toMap

Map.empty ++ (List(1,2,3).view.map(_ -> "foo"))

There is also a force method for executing the transformation operations, but this seems to have a nasty habit of giving you a more general type back (perhaps someone can comment with a reason):

scala> Set(1,2,3).view.map(_ + 1).force
res23: Iterable[Int] = Set(2, 3, 4)
like image 42
Luigi Plinge Avatar answered Nov 03 '22 14:11

Luigi Plinge