Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reconstruct superset from random ordered subsets

Is there an efficient way to reconstruct a superset (for example ordered list {a, b, c, d, e, f, g}) if we have random samples that preserve order of the superset (eg. a comes before b), but are not necessarily constructed from consecutive elements from superset (sample can contain holes, for example {a, c, d})? In addition the sample size may be any length greater or equal to 2 (sample length of one cannot preserve order), but let's assume that during single computation the sample length remains constant (no variable length data).

For example an easy case would be {a, b, c}, {c, d, e} and {e, f, g}, a harder case would be something like {a, f, g}, {b, e, f}, {c, d, e} etc...

What are the limits of such algorithm, eg. minimum number of samples required, given the size of superset and sample length? When data cannot be reconstructed? etc.

like image 370
Nyxeria Avatar asked Dec 03 '25 18:12

Nyxeria


1 Answers

You can iterate over the subsets and use the information contained in each to build a directed graph. If that graph contains all the elements of the superset, and the graph has exactly one path through all elements, then that path is the superset's order.

Practically, when adding information from the ordered subset {a,b,c} to the directed graph, you need to follow these three rules:

  • Add edges a→b and b→c, but not a→c.
  • If there is already a path from a to b, e.g. a→x→y→b, then don't add edge a→b.
  • If there is an edge x→y, and adding edge a→b creates an additional path from x to y, e.g. x→a→b→z→y, then remove the edge x→y.

Example:

Let's say that for the superset {a,b,c,d,e,f,g,h,i,j,k,l} we have added the information from a number of subsets to the graph, and arrived at this situation:

directed graph

The next subset is {b,g,h,k}. There is already a path from b to g, and from h to k, so we don't add the edges b→g or h→k. The edge we do add is g→h:

directed graph adding edge g to h

We have now created an additional path from e to h, e→g→h, so we remove the edge e→h. We have also created an additional path from f to h, f→g→h, so we remove the edge f→h, to get:

directed graph with edges eh and fh removed

Adding information from subset {b,g,h,k} has simplified the graph, and we now know that element g is the seventh element in the ordered superset, because all paths from starting elements a and b to ending elements k and l pass through it, and six elements come before it.

(Note that the elements that all paths pass through: c, g and j, divide the graph into four subgraphs: abc, cdefg, ghij and jkl, which can then be considered seperately, e.g. when looking for edges to remove after a new edge has been added to that subgraph.)

To know the order of the superset, we would need information from additional subsets to simplify the graph to the point where there is a single path through all elements:

directed graph with a single path


A superset with N elements has 2N unique ordered subsets; if you know all those subsets, you can be sure that you know the order of the superset. If you removed all subsets that contain two adjacent elements x and y, the order of those two elements would no longer be known. So in the worst case, you'd need 2N - 2N-2 + 1 subsets to know the order of the superset.

In the best case, you only need one subset, equal to the superset, to know the order of the superset. (And, as you mention in the question, a chain of subsets where the last element of one subset is the first element of the next, is an extension of this best case.)

like image 155


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!