Or how to avoid accidental removal of duplicates when mapping a Set
?
This is a mistake I'm doing very often. Look at the following code:
def countSubelements[A](sl: Set[List[A]]): Int = sl.map(_.size).sum
The function shall count the accumulated size of all the contained lists. The problem is that after mapping the lists to their lengths, the result is still a Set
and all lists of size 1 are reduced to a single representative.
Is it just me having this problem? Is there something I can do to prevent this happening? I think I'd love to have two methods mapToSet
and mapToSeq
for Set
. But there is no way to enforce this, and sometimes you don't locally notice that you are working with a Set
.
Maybe it's even possible that you were writing code for a Seq
and something changes in another class and the underlying object becomes a Set
?
Maybe something like a best practise to not let this situation arise at all?
Imagine the following situation:
val totalEdges = graph.nodes.map(_.getEdges).map(_.size).sum / 2
You fetch a collection of Node
objects from a graph, use them to get their adjacent edges and sum over them. This works if graph.nodes
returns a Seq
.
And it breaks if someone decides to make Graph
return its nodes as a Set
; without this code looking suspicious (at least not to me, do you expect every collection could possibly end up being a Set
?) and without touching it.
The difference is set is used to store only keys while map is used to store key value pairs. For example consider in the problem of printing sorted distinct elements, we use set as there is value needed for a key. While if we change the problem to print frequencies of distinct sorted elements, we use map.
What is the difference between map(), applymap() and apply() methods in pandas? – In padas, all these methods are used to perform either to modify the DataFrame or Series. map() is a method of Series, applymap() is a method of DataFrame, and apply() is defined in both DataFrame and Series.
The map. set() method is used to add key-value pairs to a Map object. It can also be used to update the value of an existing key. Each value must have a unique key so that they get mapped correctly.
It seems there will be many possible "gotcha's" if one expects a Seq
and gets a Set
. It's not a surprise that method implementations can depend on the type of the object and (with overloading) the arguments. With Scala implicits, the method can even depend on the expected return type.
A way to defend against surprises is to explicitly label types. For example, annotating methods with return types, even if it's not required. At least this way, if the type of graph.nodes
is changed from Seq
to Set
, the programmer is aware that there's potential breakage.
For your specific issue, why not define your ownmapToSeq
method,
scala> def mapToSeq[A, B](t: Traversable[A])(f: A => B): Seq[B] =
t.map(f)(collection.breakOut)
mapToSeq: [A, B](t: Traversable[A])(f: A => B)Seq[B]
scala> mapToSeq(Set(Seq(1), Seq(1,2)))(_.sum)
res1: Seq[Int] = Vector(1, 3)
scala> mapToSeq(Seq(Seq(1), Seq(1,2)))(_.sum)
res2: Seq[Int] = Vector(1, 3)
The advantage of using breakOut: CanBuildFrom
is that the conversion from a Set
to a Seq
has no additional overhead.
You can make use the pimp my library pattern to make mapToSeq
appear to be part of the Traversable
trait, inherited by Seq
and Set
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With