Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When applying `map` to a `Set` you sometimes want the result not to be a set but overlook this

Tags:

scala

Or how to avoid accidental removal of duplicates when mapping a Set?

This is a mistake I'm doing very often. Look at the following code:

def countSubelements[A](sl: Set[List[A]]): Int = sl.map(_.size).sum

The function shall count the accumulated size of all the contained lists. The problem is that after mapping the lists to their lengths, the result is still a Set and all lists of size 1 are reduced to a single representative.

Is it just me having this problem? Is there something I can do to prevent this happening? I think I'd love to have two methods mapToSet and mapToSeq for Set. But there is no way to enforce this, and sometimes you don't locally notice that you are working with a Set.

Maybe it's even possible that you were writing code for a Seq and something changes in another class and the underlying object becomes a Set?

Maybe something like a best practise to not let this situation arise at all?

Remote edits break my code

Imagine the following situation:

val totalEdges = graph.nodes.map(_.getEdges).map(_.size).sum / 2

You fetch a collection of Node objects from a graph, use them to get their adjacent edges and sum over them. This works if graph.nodes returns a Seq.

And it breaks if someone decides to make Graph return its nodes as a Set; without this code looking suspicious (at least not to me, do you expect every collection could possibly end up being a Set?) and without touching it.

like image 604
ziggystar Avatar asked Aug 12 '11 13:08

ziggystar


People also ask

Why use a map over a Set?

The difference is set is used to store only keys while map is used to store key value pairs. For example consider in the problem of printing sorted distinct elements, we use set as there is value needed for a key. While if we change the problem to print frequencies of distinct sorted elements, we use map.

What is difference between apply and Applymap?

What is the difference between map(), applymap() and apply() methods in pandas? – In padas, all these methods are used to perform either to modify the DataFrame or Series. map() is a method of Series, applymap() is a method of DataFrame, and apply() is defined in both DataFrame and Series.

Can you use map on a Set Javascript?

The map. set() method is used to add key-value pairs to a Map object. It can also be used to update the value of an existing key. Each value must have a unique key so that they get mapped correctly.


1 Answers

It seems there will be many possible "gotcha's" if one expects a Seq and gets a Set. It's not a surprise that method implementations can depend on the type of the object and (with overloading) the arguments. With Scala implicits, the method can even depend on the expected return type.

A way to defend against surprises is to explicitly label types. For example, annotating methods with return types, even if it's not required. At least this way, if the type of graph.nodes is changed from Seq to Set, the programmer is aware that there's potential breakage.

For your specific issue, why not define your ownmapToSeq method,

scala> def mapToSeq[A, B](t: Traversable[A])(f: A => B): Seq[B] =
           t.map(f)(collection.breakOut)
mapToSeq: [A, B](t: Traversable[A])(f: A => B)Seq[B]

scala> mapToSeq(Set(Seq(1), Seq(1,2)))(_.sum)
res1: Seq[Int] = Vector(1, 3)

scala> mapToSeq(Seq(Seq(1), Seq(1,2)))(_.sum)
res2: Seq[Int] = Vector(1, 3)

The advantage of using breakOut: CanBuildFrom is that the conversion from a Set to a Seq has no additional overhead.

You can make use the pimp my library pattern to make mapToSeq appear to be part of the Traversable trait, inherited by Seq and Set.

like image 162
Kipton Barros Avatar answered Nov 15 '22 04:11

Kipton Barros