Here is my problem: I have a sequence S of (nonempty but possibly not distinct) sets s_i, and for each s_i need to know how many sets s_j in S (i ≠ j) are subsets of s_i.
I also need incremental performance: once I have all my counts, I may replace one set s_i by some subset of s_i and update the counts incrementally.
Performing all this using purely functional code would be a huge plus (I code in Scala).
As set inclusion is a partial ordering, I thought the best way to solve my problem would be to build a DAG that would represent the Hasse diagram of the sets, with edges representing inclusion, and join an integer value to each node representing the size of the sub-dag below the node plus 1. However, I have been stuck for several days trying to develop the algorithm that builds the Hasse diagram from the partial ordering (let's not talk about incrementality!), even though I thought it would be some standard undergraduate material.
Here is my data structure :
case class HNode[A] (
val v: A,
val child: List[HNode[A]]) {
val rank = 1 + child.map(_.rank).sum
}
My DAG is defined by a list of roots and some partial ordering:
class Hasse[A](val po: PartialOrdering[A], val roots: List[HNode[A]]) {
def +(v: A): Hasse[A] = new Hasse[A](po, add(v, roots))
private def collect(v: A, roots: List[HNode[A]], collected: List[HNode[A]]): List[HNode[A]] =
if (roots == Nil) collected
else {
val (subsets, remaining) = roots.partition(r => po.lteq(r.v, v))
collect(v, remaining.map(_.child).flatten, subsets.filter(r => !collected.exists(c => po.lteq(r.v, c.v))) ::: collected)
}
}
I am pretty stuck here. The last I came up to add a new value v to the DAG is:
collect
function, possibly non-optimal or even flawed).I have not yet implemented fully this algorithm, but it seems uncessarily circonvoluted and nonoptimal for my apparently simple problem. Is there some simpler algorithm available (Google was clueless on this)?
After some work, I finally ended up solving my problem, following my initial intuition. The collect method and rank evaluation were flawed, I rewrote them with tail-recursion as a bonus. Here is the code I obtained:
final case class HNode[A](
val v: A,
val child: List[HNode[A]]) {
val rank: Int = 1 + count(child, Set.empty)
@tailrec
private def count(stack: List[HNode[A]], c: Set[HNode[A]]): Int =
if (stack == Nil) c.size
else {
val head :: rem = stack
if (c(head)) count(rem, c)
else count(head.child ::: rem, c + head)
}
}
// ...
private def add(v: A, roots: List[HNode[A]]): List[HNode[A]] = {
val newNode = HNode(v, collect(v, roots, Nil))
attach(newNode, roots)
}
private def attach(n: HNode[A], roots: List[HNode[A]]): List[HNode[A]] =
if (roots.contains(n)) roots
else {
val (supersets, remaining) = roots.partition { r =>
// Strict superset to avoid creating cycles in case of equal elements
po.tryCompare(n.v, r.v) == Some(-1)
}
if (supersets.isEmpty) n :: remaining.filter(r => !po.lteq(r.v, n.v))
else {
supersets.map(s => HNode(s.v, attach(n, s.child))) ::: remaining
}
}
@tailrec
private def collect(v: A, stack: List[HNode[A]], collected: List[HNode[A]]): List[HNode[A]] =
if (stack == Nil) collected
else {
val head :: tail = stack
if (collected.exists(c => po.lteq(head.v, c.v))) collect(v, tail, collected)
else if (po.lteq(head.v, v)) collect(v, tail, head :: (collected.filter(c => !po.lteq(c.v, head.v))))
else collect(v, head.child ::: tail, collected)
}
I now must check some optimization: - cut off branches with totally distinct sets when collecting subsets (as Rex Kerr suggested) - see if sorting the sets by size improves the process (as mitchus suggested)
The following problem is to work the (worst case) complexity of the add() operation out. With n the number of sets, and d the size of the largest set, the complexity will probably be O(n²d), but I hope it can be refined. Here is my reasoning: if all sets are distinct, the DAG will be reduced to a sequence of roots/leaves. Thus, every time I try to add a node to the data structure, I still have to check for inclusion with each node already present (both in collect and attach procedures). This leads to 1 + 2 + … + n = n(n+1)/2 ∈ O(n²) inclusion checks.
Each set inclusion test is O(d), hence the result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With