Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging multiple, arbitrarily sorted lists into one list

Given 3 lists which are arbitrarily sorted by the same but unknown sort order. Is there an algorithm that merges these lists into one which is then still sorted by the same order?

Example:

List1: a b c f h

List2: b c e h

List3: c d e f

Assume these lists are sorted but the sort order used is not known. I want to combine these lists to a result that does not contain duplicates but still maintains the sort order: a b c d e f h

As said above: It is known, that the given lists are sorted but it is not known by which order, but the requirement is that the merged list is still sorted by the same (unknown) order.

In the example above, I know that the element "f" is positioned between "e" and "h" because from List1 I know that

"c" < "f" < "h",

from List2 I know that

"c" < "e" < "h"

and from List3 I know that

"e" < "f" and "c" < "e"

which combines to:

"c" < "e" < "f" < "h"

If the sort order cannot be determined by any of the given lists, it is permissible to just append the element to the end of the result list. Also, if the sort order of a sequence of elements cannot be determined it is permissible to insert them in any order into the list as long as they are in the right place (e.g. if I know that "b" and "c" must be inserted between "a" and "d" but I don't know if it should be a b c d or a c b d, then both are permissible.)

Of course this is just an example. The real lists are longer (but contain less than 100 elements), contain not single but multiple character elements and the sort order is not alphabetic. Also, I have got up to 5 lists.

I need to implement this algorithm in Delphi (and no: This is not homework but a real life problem), but I take an algorithm in an language provided it does not contain too much compiler magic or complex library functions.

Performance is not much of an issue because this is done once.

like image 591
dummzeuch Avatar asked Oct 10 '13 14:10

dummzeuch


People also ask

What is multiway merge sort?

In computer science, k-way merge algorithms or multiway merges are a specific type of sequence merge algorithms that specialize in taking in k sorted lists and merging them into a single sorted list. These merge algorithms generally refer to merge algorithms that take in a number of sorted lists greater than two.

How do merge sorts work?

A merge sort uses a technique called divide and conquer. The list is repeatedly divided into two until all the elements are separated individually. Pairs of elements are then compared, placed into order and combined. The process is then repeated until the list is recompiled as a whole.

Does merge sort divide the list?

Conceptually, a merge sort works as follows: Divide the unsorted list into n sublists, each containing one element (a list of one element is considered sorted). Repeatedly merge sublists to produce new sorted sublists until there is only one sublist remaining. This will be the sorted list.


2 Answers

Your input lists define a partial order of your items. According to an answer at Math.SE, what you want is a topological sort. Algorithms are described on Wikipedia.

like image 144
Rob Kennedy Avatar answered Sep 24 '22 12:09

Rob Kennedy


Nice question. Although topological sort may be the most recommended method, you would have to parse the input first to build up the dependencies list. I thought of a more directed approach, based on finding the items that occur in multiple lists for setting up the order definition.

I am unable to predict anything on time complexity, but since you do not care about performance and especially considering the total number of items being 500 at maximum, I think this algorithm should work nicely.

Algorithm

  • All lists are put together into a temporary list which then is naturally sorted in order to identify and sift out all duplicate items. Those duplicates, named Keys, constitute the sole definition of the final sort order.
  • The Key list is sorted in the input sorting order by comparing every two items: if the two Keys occur in the same input list, then the first of them in that list comes in front of the second in the output list too. If two keys do not occur simultaneously in any input list, then they are considered equal.
  • Subsequently, a loop cycles over the Keys.
  • On every cycle, within each input list each item between the previous Key and the current Key is added to the output list. A cycle ends with the addition of the current Key.

Implementation

type
  TSorterStringList = class(TStringList)
  protected
    Id: Integer;
    KeyId: Integer;
    function Current: String;
  public
    constructor Create;
  end;

  TSorterStringLists = class(TObjectList)
  private
    function GetItem(Index: Integer): TSorterStringList;
  public
    property Items[Index: Integer]: TSorterStringList read GetItem; default;
  end;

  TSorter = class(TObject)
  private
    FInput: TSorterStringLists;
    FKeys: TStringList;
    procedure GenerateKeys;
    function IsKey(const S: String): Boolean;
  public
    constructor Create;
    destructor Destroy; override;
    procedure Sort(Output: TStrings);
    property Input: TSorterStringLists read FInput;
  end;

{ TSorterStringList }

constructor TSorterStringList.Create;
begin
  inherited Create;
  KeyId := -1;
end;

function TSorterStringList.Current: String;
begin
  Result := Strings[Id];
end;

{ TSorterStringLists }

function TSorterStringLists.GetItem(Index: Integer): TSorterStringList;
begin
  if Index >= Count then
    Count := Index + 1;
  if inherited Items[Index] = nil then
    inherited Items[Index] := TSorterStringList.Create;
  Result := TSorterStringList(inherited Items[Index]);
end;

{ TSorter }

constructor TSorter.Create;
begin
  inherited Create;
  FInput := TSorterStringLists.Create(True);
  FKeys := TStringList.Create;
end;

destructor TSorter.Destroy;
begin
  FKeys.Free;
  FInput.Free;
  inherited Destroy;
end;

threadvar
  CurrentSorter: TSorter;

function CompareKeys(List: TStringList; Index1, Index2: Integer): Integer;
var
  Input: TSorterStringLists;
  I: Integer;
  J: Integer;
  K: Integer;
begin
  Result := 0;
  Input := CurrentSorter.Input;
  for I := 0 to Input.Count - 1 do
  begin
    J := Input[I].IndexOf(List[Index1]);
    K := Input[I].IndexOf(List[Index2]);
    if (J > - 1) and (K > -1) then
    begin
      Result := J - K;
      Break;
    end;
  end;
end;

procedure TSorter.GenerateKeys;
var
  All: TStringList;
  I: Integer;
begin
  All := TStringList.Create;
  try
    All.Sorted := True;
    All.Duplicates := dupAccept;
    for I := 0 to FInput.Count - 1 do
      All.AddStrings(FInput[I]);
    for I := 0 to All.Count - 2 do
      if (All[I] = All[I + 1]) then
        if (FKeys.Count = 0) or (FKeys[FKeys.Count - 1] <> All[I]) then
          FKeys.Add(All[I]);
  finally
    All.Free;
  end;
  CurrentSorter := Self;
  FKeys.CustomSort(CompareKeys);
end;

function TSorter.IsKey(const S: String): Boolean;
begin
  Result := FKeys.IndexOf(S) > -1;
end;

procedure TSorter.Sort(Output: TStrings);
var
  KeyId: Integer;
  I: Integer;
  List: TSorterStringList;
begin
  if FInput.Count = 0 then
    Exit;
  Output.BeginUpdate;
  try
    GenerateKeys;
    for KeyId := -1 to FKeys.Count - 1 do
    begin
      for I := 0 to FInput.Count - 1 do
      begin
        List := FInput[I];
        if List.KeyId <= KeyId then
          while (List.Id < List.Count) and not IsKey(List.Current) do
          begin
            Output.Add(List.Current);
            Inc(List.Id);
          end;
        while (List.Id < List.Count) and IsKey(List.Current) do
        begin
          List.KeyId := FKeys.IndexOf(List.Current);
          Inc(List.Id);
        end;
      end;
      if KeyId + 1 < FKeys.Count then
        Output.Add(FKeys[KeyId + 1]);
    end;
  finally
    Output.EndUpdate;
  end;
end;

Example usage

procedure TForm1.Button1Click(Sender: TObject);
var
  Sorter: TSorter;
begin
  Sorter := TSorter.Create;
  try
    Sorter.Input[0].CommaText := '1, 2, 4, 9, 10, 11, 22, 46, 48, 51, 70, 72';
    Sorter.Input[1].CommaText := '3, 9, 23, 43, 44, 45, 47, 48, 51, 71, 90, 91';
    Sorter.Input[2].CommaText := '0, 3, 4, 7, 8, 11, 23, 50, 51, 52, 55, 70';
    Sorter.Input[3].CommaText := '2, 6, 14, 15, 36, 37, 38, 39, 51, 65, 66, 77';
    Sorter.Input[4].CommaText := '5, 27, 120, 130';
    ListBox1.Items.Assign(Sorter.Input[0]);
    ListBox2.Items.Assign(Sorter.Input[1]);
    ListBox3.Items.Assign(Sorter.Input[2]);
    ListBox4.Items.Assign(Sorter.Input[3]);
    ListBox5.Items.Assign(Sorter.Input[4]);
    Sorter.Sort(ListBox6.Items);
    // Results in:
    // 1, 0, 5, 27, 120, 130, 3, 2, 6, 14, 15, 36, 37, 38, 39, 4, 7, 8, 9, 10,
    // 11, 22, 46, 23, 43, 44, 45, 47, 50, 48, 51, 71, 90, 91, 52, 55, 65, 66,
    // 77, 70, 72
  finally
    Sorter.Free;
  end;
end;
like image 40
NGLN Avatar answered Sep 22 '22 12:09

NGLN