Say I have two sets, set1 = {a,b,c,d,e,f}
and set2 = {a,b,c,d,e,g}
. Rather than expressing these explicitly, I want to create something like
common = {a,b,c,d,e}
set1 = common + f
set2 = common + g
If we wanted to represent {a,b,c,h}
, we could represent it as common - d - e + h
.
My goal is basically to be able to generate the optimal common portion to be used. With only one common section this isn't too challenging, but I need to allow more than one (but not unlimited, or the benefits gained would be trivial).
By optimal, I mean "least number of elements expressed". So in the above example, it "costs" 5 (number of elements) to make the common
variable. Then sets 1 and 2 both cost 2 (one to reference common, one to add the extra element), totalling 7. Without the substitution, these would cost 12 to store (6 elements each). Similarly, in subtracting an element from a referenced would "cost" 1.
Another example,
{a,b,c,d}, {a,c,d,e}, {e,f,g,h} and {e,f}
could be
common1 = {a,c,d}
common2 = {e,f,g}
set1 = common1 + b
set2 = common1 + e
set3 = common2 + h
set4 = common2 - g
By allowing multiple common portions this becomes a lot more challenging. Is there a name for this type of problem, or something similar? It seems like it could be related to compression, but I haven't been able to find too many resources on where to start with this.
Some other details that may be relevent:
You could find all atomic sets, that is all sets that are never not seen apart.
{a,b,c,d,e,f,g,h} | {a,b,c,d} = {a,b,c,d},{e,f,g,h}
{a,b,c,d},{e,f,g,h} | {a,c,d,e} = {a,b,c,d},{e,f,g,h}
{a,c,d},{b},{e},{f,g,h} | {e,f,g,h} = {a,c,d},{b},{e},{f,g,h}
{a,c,d},{b},{e},{f,g,h} | {e,f} = {a,c,d},{b},{e},{f},{g,h}
{a,b,c,d} = {a,c,d},{b}
{a,c,d,e} = {a,c,d},{e}
{e,f,g,h} = {e},{f},{g,h}
{e,f} = {e},{f}
This is a little closer but it doesnt solve the minimal breakdown.
I dont think you can find the minimal because i suspect that it is NP-Hard. If you consider a set S and create a graph where each possible subset of S is a node G. Now give a node weight according to the length of the subset, and draw an edge between each node that corresponds to the amount of change. {abc} -> {a} has a weight of 2. {bcd} -> {abe} has a weight of 4. Now to find a minimal solution to the common set problem you need to find a minimal weight spanning tree that covers each of the sets you are interested in. If you find that you can use this to build a minimal common set -- these would be equivelent. Finding minimum weight tree in a node weighted graph is called the Node-Weighed Steiner Tree Problem. A Node weighted Steiner Tree Problem can be shown equivalent to the Steiner Tree Problem. The Steiner Tree problem can be show to be NP-Hard. So I strongly suspect the problem you are trying to solve is NP-Hard.
http://theory.cs.uni-bonn.de/info5/steinerkompendium/node15.html
http://theory.cs.uni-bonn.de/info5/steinerkompendium/node17.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With