Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala SortedMap.map method returns non-sorted map when static type is Map

I encountered some unauthorized strangeness working with Scala's SortedMap[A,B]. If I declare the reference to SortedMap[A,B] "a" to be of type Map[A,B], then map operations on "a" will produce a non-sorted map implementation.

Example:

import scala.collection.immutable._

object Test extends App {
    val a: Map[String, String] = SortedMap[String, String]("a" -> "s", "b" -> "t", "c" -> "u", "d" -> "v", "e" -> "w", "f" -> "x")
    println(a.getClass+": "+a)

    val b = a map {x => x}  // identity
    println(b.getClass+": "+b)
}

The output of the above is:

class scala.collection.immutable.TreeMap: Map(a -> s, b -> t, c -> u, d -> v, e -> w, f -> x)
class scala.collection.immutable.HashMap$HashTrieMap: Map(e -> w, f -> x, a -> s, b -> t, c -> u, d -> v)

The order of key/value pairs before and after the identity transformation is not the same.

The strange thing is that removing the type declaration from "a" makes this issue go away. That's fine in a toy example, but makes SortedMap[A,B] unusable for passing to methods that expect Map[A,B] parameters.

In general, I would expect higher order functions such as "map" and "filter" to not change the fundamental properties of the collections they are applied to.

Does anyone know why "map" is behaving like this?

like image 360
frig.neutron Avatar asked Sep 27 '12 20:09

frig.neutron


3 Answers

The map method, like most of the collection methods, isn't defined specifically for SortedMap. It is defined on a higher-level class (TraversableLike) and uses a "builder" to turn the mapped result into the correct return type.

So how does it decide what the "correct" return type is? Well, it tries to give you back the return type that it started out as. When you tell Scala that you have a Map[String,String] and ask it to map, then the builder has to figure out how to "build" the type for returning. Since you told Scala that the input was a Map[String,String], the builder decides to build a Map[String,String] for you. The builder doesn't know that you wanted a SortedMap, so it doesn't give you one.

The reason it works when you leave off the the Map[String,String] type annotation is that Scala infers that the type of a is SortedMap[String,String]. Thus, when you call map, you are calling it on a SortedMap, and the builder knows to construct a SortedMap for returning.

As far as your assertion that methods shouldn't change "fundamental properties", I think you're looking at it from the wrong angle. The methods will always give you back an object that conforms to the type that you specify. It's the type that defines the behavior of the builder, not the underlying implementation. When you think about like that, it's the type that forms the contract for how methods should behave.

Why might we want this?

Why is this the preferred behavior? Let's look at a concrete example. Say we have a SortedMap[Int,String]

val sortedMap = SortedMap[Int, String](1 -> "s", 2 -> "t", 3 -> "u", 4 -> "v")

If I were to map over it with a function that modifies the keys, I run the risk of losing elements when their keys clash:

scala> sortedMap.map { case (k, v) => (k / 2, v) }
res3: SortedMap[Int,String] = Map(0 -> s, 1 -> u, 2 -> v)

But hey, that's fine. It's a Map after all, and I know it's a Map, so I should expect that behavior.

Now let's say we have a function that accepts an Iterable of pairs:

def f(iterable: Iterable[(Int, String)]) = 
  iterable.map { case (k, v) => (k / 2, v) }

Since this function has nothing to do with Maps, it would be very surprising if the result of this function ever had fewer elements than the input. After all, map on a Iterable should produce the mapped version of each element. But a Map is an Iterable of pairs, so we can pass it into this function. So what happens in Scala when we do?

scala> f(sortedMap)
res4: Iterable[(Int, String)] = List((0,s), (1,t), (1,u), (2,v))

Look at that! No elements lost! In other words, Scala won't surprise us by violating our expectations about how map on an Iterable should work. If the builder instead tried to produce a SortedMap based on the fact that the input was a SortedMap, then our function f would have surprising results, and this would be bad.

So the moral of the story is: Use the types to tell the collections framework how to deal with your data. If you want your code to be able to expect that a map is sorted, then you should type it as SortedMap.

like image 67
dhg Avatar answered Oct 18 '22 05:10

dhg


The signature of map is:

def map[B, That](f: ((A, B)) ⇒ B)(implicit bf: CanBuildFrom[Map[A, B], B, That]): That

The implicit parameter bf is used to build the resulting collection. So in your example, since the type of a is Map[String, String], the type of bf is:

val cbf = implicitly[CanBuildFrom[Map[String, String], (String, String), Map[String, String]]]

Which just builds a Map[String, String] which doesn't have any of the properties of the SortedMap. See:

cbf() ++= List("b" -> "c", "e" -> "g", "a" -> "b") result

For more information, see this excellent article: http://docs.scala-lang.org/overviews/core/architecture-of-scala-collections.html

like image 31
3 revs Avatar answered Oct 18 '22 07:10

3 revs


As dyross points out, it's the Builder, which is chosen (via the CanBuildFrom) on the basis of the target type, which determines the class of the collection that you get out of a map operation. Now this might not be the behaviour that you wanted, but it does for example allow you select the target type:

val b: SortedMap[String, String] = a.map(x => x)(collection.breakOut) 

(breakOut gives a generic CanBuildFrom whose type is determined by context, i.e. our type annotation.)

So you could add some type parameters that allow you accept any sort of Map or Traversable (see this question), which would allow you do do a map operation in your method while retaining the correct type information, but as you can see it's not straightforward.

I think a much simpler approach is instead to define functions that you apply to your collections using the collections' map, flatMap etc methods, rather than by sending the collection itself to a method.

i.e. instead of

def f[Complex type parameters](xs: ...)(complex implicits) = ...
val result = f(xs)

do

val f: X => Y = ...
val results = xs map f
like image 39
Luigi Plinge Avatar answered Oct 18 '22 05:10

Luigi Plinge