Merge Sets of Sets that contain common elements in Scala

Tags:

scala-collections

I want to implement a function in Scala, that, given a Set of Sets of Ints will merge any containing Set that contains one or more common elements.

So for example, given:

def mergeSets(sets: Set[Set[Int]]): Set[Set[Int]] = ???

val sets = Set(Set(1,2), Set(2,3), Set(3,7), Set(8,10))  
val mergedSets = mergeSets(sets)

mergedSets will contain Set(Set(1,2,3,7), Set(8,10))

What would be a nice, efficient and functional if possible, way to do this in Scala?

834

asked Sep 02 '14 04:09

3 Answers

The most efficient way to do this will be using mutable structures, but you asked for a functional way, so here goes:

sets.foldLeft(Set.empty[Set[Int]])((cum, cur) => {
  val (hasCommon, rest) = cum.partition(_ & cur nonEmpty)
  rest + (cur ++ hasCommon.flatten)
})

(Not tested, wrote this using phone)

125

answered Nov 16 '22 01:11

A version that's largely in the spirit of samthebest's answer, but (by design) less deeply idiomatic. It may be more approachable for those new to functional programming. (It seems we should squeeze everything we can out of such a nice problem.)

def mergeSets(sets: Set[Set[Int]]): Set[Set[Int]] = {
  if (sets.isEmpty) {
    Set.empty[Set[Int]]
  } else {
    val cur = sets.head
    val merged = mergeSets(sets.tail)
    val (hasCommon, rest) = merged.partition(_ & cur nonEmpty)
    rest + (cur ++ hasCommon.flatten)
  }
}

However, the following alternative has the advantage of being tail recursive and perhaps also providing a smoother path to understanding samthebest's answer:

def mergeSets(cum: Set[Set[Int]], sets: Set[Set[Int]]): Set[Set[Int]] = {
  if (sets.isEmpty) {
    cum
  } else {
    val cur = sets.head
    val (hasCommon, rest) = cum.partition(_ & cur nonEmpty)
    mergeSets(rest + (cur ++ hasCommon.flatten), sets.tail)
  }
}

def mergeSets(sets: Set[Set[Int]]): Set[Set[Int]] = 
  mergeSets(Set.empty[Set[Int]], sets)

I don't claim either of these as superior: just useful as learning tools.

answered Nov 16 '22 02:11

Spiro Michaylov

Samthebest's terse solution is very satisfying in it's simplicity and elegance, but I am working with a a large number of sets and needed a more performant solution that is still immutable and written in good functional style.

For 10,000 sets with 10 elements each (randomly chosen ints from 0 to 750,000), samthebest's terse solution took an average of ~ 30sec on my computer, while my solution below took on average ~ 400ms.

(In case anyone was wondering, the resultant set for the above set cardinalities contains ~ 3600 sets, with an average of ~ 26 elements each)

If anyone can see any improvements I could make with respect to style or performance, please let me know!

Here's what I came up with:

val sets = Set(Set(1, 2), Set(2, 3), Set(4, 5))
Association.associate(sets) => Set(Set(1, 2, 3), Set(4, 5))


object Association {

  // Keep track of all current associations, as well as every element in any current association
  case class AssociationAcc[A](associations: Set[Set[A]] = Set.empty[Set[A]], all: Set[A] = Set.empty[A]) {
    def +(s: Set[A]) = AssociationAcc(associations + s, all | s)
  }

  // Add the newSet to the set associated with key A
  // (or simply insert if there is no such key).
  def updateMap[A](map: Map[A, Set[A]], key: A, newSet: Set[A]) = {
    map + (key -> (map.getOrElse(key, Set.empty) ++ newSet))
  }

  // Turn a Set[Set[A]] into a map where each A points to a set of every other A
  // it shared any set with.
  //
  // e.g. sets = Set(Set(1, 2), Set(2, 3), Set(4, 5))
  //     yields: Map(1 -> Set(2), 2 -> Set(1, 3), 3 -> Set(2),
  //                 4 -> Set(5), 5 -> Set(4))
  def createAssociationMap[A](sets: Set[Set[A]]): Map[A, Set[A]] = {
    sets.foldLeft(Map.empty[A, Set[A]]) { case (associations, as) =>
      as.foldLeft(associations) { case (assoc, a) => updateMap(assoc, a, as - a) }
    }
  }

  // Given a map where each A points to a set of every A it is associated with,
  // and also given a key A starting point, return the total set of associated As.
  //
  // e.g. with map = Map(1 -> Set(2), 2 -> Set(1, 3), 3 -> Set(2),
  //                     4 -> Set(5), 5 -> Set(4))
  // and key = 1 (or 2 or 3) yields: Set(1, 2, 3).
  // with key = 4 (or 5) yields: Set(4, 5)
  def getAssociations[A](map: Map[A, Set[A]], key: A, hit: Set[A] = Set.empty[A]): Set[A] = {
    val newAssociations = map(key) &~ hit
    newAssociations.foldLeft(newAssociations | hit + key) {
      case (all, a) => getAssociations(map, a, all)
    }
  }

  // Given a set of sets that may contain common elements, associate all sets that
  // contain common elements (i.e. take union) and return the set of associated sets.
  //
  // e.g. Set(Set(1, 2), Set(2, 3), Set(4, 5)) yields: Set(Set(1, 2, 3), Set(4, 5))
  def associate[A](sets: Set[Set[A]]): Set[Set[A]] = {
    val associationMap = createAssociationMap(sets)
    associationMap.keySet.foldLeft(AssociationAcc[A]()) {
      case (acc, key) =>
        if (acc.all.contains(key)) acc
        else                       acc + getAssociations(associationMap, key)
    }.associations
  }
}

answered Nov 16 '22 02:11

Eddie Carlson

Related questions
                            
                                Play framework: Server monitoring and performance admin page
                            
                                Overriding in Scala, overloaded Java methods distinguished by array type
                            
                                toSet and type inference
                            
                                Adding types in [ ] brackets in function definition
                            
                                Single Chinese character determined as length 2 in Java/Scala String
                            
                                How do I generate a random number using functional state?
                            
                                Conditionally include provided scope dependencies with sbt and the universal plugin
                            
                                How to run Spark Scala code on Amazon EMR
                            
                                Scala cats, traverse Seq
                            
                                Single documentation for mixed (Scala/Java) project?
                            
                                What is the most elegant way to deal with an external library with internal state using a function programming language?
                            
                                How to fallback Scala version for SBT dependencies?
                            
                                Filtering resources in SBT
                            
                                Scala 'fromFile' weirdness?
                            
                                Overriding Java interface with overloaded vargs methods in Scala
                            
                                In Scala 2.10, how do you create a ClassTag given a TypeTag
                            
                                Monadic fold with State monad in constant space (heap and stack)?
                            
                                Why "set" can't assign value to custom SettingKey I can "show" in sbt shell?
                            
                                Scala using toSet.toList vs distinct
                            
                                Verifying mocked object method calls with default arguments

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Merge Sets of Sets that contain common elements in Scala

Tags:

scala

scala-collections

spyk

People also ask

3 Answers

samthebest

Spiro Michaylov

Eddie Carlson

Recent Activity

Donate For Us