Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collection of sets containing no sets which are a subset of another in the collection

I am looking for an abstract data structure which represents a collection of sets such that no set in the collection is a subset of another set in the collection.

This means that on insert the following conditions will be met:

A. Inserting an element that is already a subset of another element will return the original collection.

B. Inserting an element that is a superset of any other elements will result in a collection with the superset added and the subsets removed.

Assuming an ordering on the elements of the set, then a prefix tree can be used to represent the collection. This permits condition A to be handled very quickly (ie it takes no longer to check the condition than it would to insert the subset) however meeting condition B takes time.

I am wondering if there is data structure that allows B to be met quickly as well.

like image 455
Mark Wassell Avatar asked Nov 15 '09 09:11

Mark Wassell


People also ask

What is a collection of sets called?

In set theory and related branches of mathematics, a collection F of subsets of a given set S is called a family of subsets of S, or a family of sets over S. More generally, a collection of any sets whatsoever is called a family of sets, set family, or a set system.

What do we call the collection of all the subsets of a set?

The set of all the subsets of a set is called Power set.

What is a collection of elements in set?

A set is a collection of objects. The objects are called the elements of the set. If a set has finitely many elements, it is a finite set, otherwise it is an infinite set. If the number of elements in a set is not too many, we can just list them out.

Is the empty set a subset of the set containing the empty set?

Every nonempty set has at least two subsets, 0 and itself. The empty set has only one, itself. The empty set is a subset of any other set, but not necessarily an element of it.


1 Answers

The trivial approach would be to keep a list of sets and perform a linear search through that for every incoming set (testing if the incoming is a subset).

This obviously runs in O(n) time for the linear search and possibly O(m) size for the size of the incoming set. Thus O(n*m) total time (number of sets vs. size of each set).

The most obvious optimization, of course, is to index on set sizes. Then you only test each incoming set against those which are of equal or larger size. (A set cannot be a subset of any smaller set, duh!).

The next optimization that comes to mind is to create in index of elements. Thus for each incoming set you'd find the intersection of each sets containing each of the elements. In other words if, for incoming set {a,b,c}, we find that element {a} exists in sets A, B, and D, element {b} exists in B, E, and F, and {c} exists in A, B, and Z ... then the incoming set is a subset of B (the intersection of {A, B, D}, {B, E, F}, and {A, B, Z}).

So, that sounds like O(m*log(n)) complexity to me. (We have to perform hashed searches on each element of each incoming set). Insertions should also be on the same order (inserting the new set's ID into each of the element's maps). (In Big-O analysis 2*O(mlog(n)) reduces down to O(mlog(n)), of course).

like image 114
Jim Dennis Avatar answered Oct 20 '22 21:10

Jim Dennis