Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging some sorted lists with unknown order sequence

I've some lists with variable number of elements. Each list is sorted, but the sorting algorithm is not known. I would like to merge the lists into one big list which contains all lists in same order, without duplicates.

Example Input:

  1. XS,M,L,XL
  2. S,M,XXL
  3. XXS,XS,S,L

Expected Result:

  • XXS,XS,S,M,L,XL,XXL

The expected result is obtained by matching up the input sequences in order to obtain a merged result that contains the elements of each input sequence in the correct order, like this:

    XS   M L XL
       S M      XXL
XXS XS S   L
-------------------
XXS XS S M L XL XXL

The function should notify, if there are elements which have ambiguous positions. Here, it would be XXL (it could stay after M,L or XL) and I need to specify its position manually after XL (because here I know the sorting algorithm and can help). I thought about defining pairs of every two elements, each pair in order as in original list. From this one could build the complete list.

like image 278
Gabriel Avatar asked Jan 10 '11 10:01

Gabriel


People also ask

What is the time complexity of merging two sorted lists?

The complexity is O(m log n). There are m iterations of the loop. Each insertion into a sorted array is an O(log n) operation. Therefore the overall complexity is O (m log n).

How many comparisons would be required to merge two sorted lists?

Explanation: To merge two lists of size m and n, we need to do m+n-1 comparisons in worst case.


1 Answers

This can be solved with a Topological Sort algorithm.

If you consider each of your input sequences to be a path through a directed graph, a topological sort will order your set of nodes from left to right in such a way that each directed edge points to the right. Diagram of a directed graph after topological sorting

The wikipedia page on Topological Sorting includes this algorithm, first described by Arthur Kahn in 1962:

L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
    remove a node n from S
    insert n into L
    for each node m with an edge e from n to m do
        remove edge e from the graph
        if m has no other incoming edges then
            insert m into S
if graph has edges then
    return error (graph has at least one cycle)
else 
    return L (a topologically sorted order)

This algorithm, as written, doesn't actually fail if it finds ambiguous sequences, but that's easy to add by inserting a check at the beginning of the loop, like this:

...
while S is non-empty do
    if S contains more than 1 item
        return error (inputs are ambiguous)
    remove a node n from S
    ...

I don't know what language you're working in, but I've thrown together this PHP implementation as a proof of concept:

function mergeSequences($sequences, $detectAmbiguity = false) {

    // build a list of nodes, with each node recording a list of all incoming edges
    $nodes = array();
    foreach ($sequences as $seq) {
        foreach ($seq as $i => $item) {
            if (!isset($nodes[$item])) $nodes[$item] = array();
            if ($i !== 0) {
                $nodes[$item][] = $seq[$i-1];
            }
        }
    }

    // build a list of all nodes with no incoming edges
    $avail = array();
    foreach ($nodes as $item => $edges) {
        if (count($edges) == 0) {
            $avail[] = $item;
            unset($nodes[$item]);
        }
    }

    $sorted = array();
    $curr = '(start)';
    while (count($avail) > 0) {

        // optional: check for ambiguous sequence
        if ($detectAmbiguity && count($avail) > 1) {
           throw new Exception("Ambiguous sequence: {$curr} can be followed by " . join(' or ', $avail));
        }

        // get the next item and add it to the sorted list
        $curr = array_pop($avail);
        $sorted[] = $curr;

        // remove all edges from the currently selected items to all others
        foreach ($nodes as $item => $edges) {
            $nodes[$item] = array_diff($edges, array($curr));                
            if (count($nodes[$item]) == 0) {
                $avail[] = $item;
                unset($nodes[$item]);
            }
        }

    }

    if (count($nodes) > 0) {
        throw new Exception('Sequences contain conflicting information. Cannot continue after: ' . join(', ', $sorted));
    }

    return $sorted;
}

You can call the function like this:

$input = array(
    array('XS', 'M', 'L', 'XL'),
    array('S', 'M', 'XXL'),
    array('XXS', 'XS', 'S', 'L'),
);
echo(join(', ', mergeSequences($input)));
echo(join(', ', mergeSequences($input, true)));

To get the following output:

XXS, XS, S, M, XXL, L, XL
Uncaught exception 'Exception' with message 'Ambiguous sequence: M can be followed by L or XXL'
like image 87
jcsanyi Avatar answered Nov 09 '22 20:11

jcsanyi