When applying `map` to a `Set` you sometimes want the result not to be a set but overlook this

Tags:

scala

Or how to avoid accidental removal of duplicates when mapping a Set?

This is a mistake I'm doing very often. Look at the following code:

def countSubelements[A](sl: Set[List[A]]): Int = sl.map(_.size).sum

The function shall count the accumulated size of all the contained lists. The problem is that after mapping the lists to their lengths, the result is still a Set and all lists of size 1 are reduced to a single representative.

Is it just me having this problem? Is there something I can do to prevent this happening? I think I'd love to have two methods mapToSet and mapToSeq for Set. But there is no way to enforce this, and sometimes you don't locally notice that you are working with a Set.

Maybe it's even possible that you were writing code for a Seq and something changes in another class and the underlying object becomes a Set?

Maybe something like a best practise to not let this situation arise at all?

Remote edits break my code

Imagine the following situation:

val totalEdges = graph.nodes.map(_.getEdges).map(_.size).sum / 2

You fetch a collection of Node objects from a graph, use them to get their adjacent edges and sum over them. This works if graph.nodes returns a Seq.

And it breaks if someone decides to make Graph return its nodes as a Set; without this code looking suspicious (at least not to me, do you expect every collection could possibly end up being a Set?) and without touching it.

604

asked Aug 12 '11 13:08

ziggystar

1 Answers

It seems there will be many possible "gotcha's" if one expects a Seq and gets a Set. It's not a surprise that method implementations can depend on the type of the object and (with overloading) the arguments. With Scala implicits, the method can even depend on the expected return type.

A way to defend against surprises is to explicitly label types. For example, annotating methods with return types, even if it's not required. At least this way, if the type of graph.nodes is changed from Seq to Set, the programmer is aware that there's potential breakage.

For your specific issue, why not define your ownmapToSeq method,

scala> def mapToSeq[A, B](t: Traversable[A])(f: A => B): Seq[B] =
           t.map(f)(collection.breakOut)
mapToSeq: [A, B](t: Traversable[A])(f: A => B)Seq[B]

scala> mapToSeq(Set(Seq(1), Seq(1,2)))(_.sum)
res1: Seq[Int] = Vector(1, 3)

scala> mapToSeq(Seq(Seq(1), Seq(1,2)))(_.sum)
res2: Seq[Int] = Vector(1, 3)

The advantage of using breakOut: CanBuildFrom is that the conversion from a Set to a Seq has no additional overhead.

You can make use the pimp my library pattern to make mapToSeq appear to be part of the Traversable trait, inherited by Seq and Set.

162

answered Nov 15 '22 04:11

Kipton Barros

Related questions
                            
                                Right Click on a Button / Scala
                            
                                Akka-http streaming using Slick 3.0 Databasepublisher
                            
                                Live resources in Akka Stream flow description
                            
                                How to create a separate compile task without a separate config, but different scalacOptions?
                            
                                Retry / replay of failed messages in AKKA
                            
                                Spark throws java.util.NoSuchElementException: key not found: 67
                            
                                Scala Play template vararg HtmlContent
                            
                                Drop into a Scala interpreter in Spark script?
                            
                                How to import libraries in Spark Notebook
                            
                                What's the benefit of scalaz.concurrent.Future, in comparison to scalaz.ContT[Trampoline, Unit, ?]
                            
                                Select in SAP HANA + Hibernate throws error: `Method unwrap of com.sap.db.jdbc.CallableStatementSapDBFinalize is not supported`
                            
                                Combining/Updating Cassandra Queried data to Structured Streaming receieved from Kafka
                            
                                Spark fails to read CSV when last column name contains spaces
                            
                                Exception: 'writeStream' can be called only on streaming Dataset/DataFrame
                            
                                Unsupported authentication token, scheme='none' only allowed when auth is disabled: { scheme='none' } - Neo4j Authentication Error
                            
                                Scala 3 (Dotty) Pattern match a function with a macro quotation
                            
                                How do I extend scala.swing?
                            
                                Scala: Creating a small executable Jar relying on external Scala libraries
                            
                                Arithmetic Expression Grammar and Parser
                            
                                Anything in Scala equivalent to C#'s `dynamic`?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With