Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to merge two lists lacking comparison between them

I am looking for an algorithm to merge two sorted lists, but they lack a comparison operator between elements of one list and elements of the other. The resulting merged list may not be unique, but any result which satisfies the relative sort order of each list will do. More precisely:

Given:

  • Lists A = {a_1, ..., a_m}, and B = {b_1, ..., b_n}. (They may be considered sets, as well).
  • A precedence operator < defined among elements of each list such that a_i < a_{i+1}, and b_j < b_{j+1} for 1 <= i <= m and 1 <= j <= n.
  • The precedence operator is undefined between elements of A and B: a_i < b_j is not defined for any valid i and j.
  • An equality operator = defined among all elements of either A or B (it is defined between an element from A and an element from B).
  • No two elements from list A are equal, and the same holds for list B.

Produce: A list C = {c_1, ..., c_r} such that:

  • C = union(A, B); the elements of C are the union of elements from A and B.
  • If c_p = a_i, c_q = a_j, and a_i < a_j, then c_p < c_q. (The order of elements of the sublists of C corresponding to sets A and B should be preserved.
  • There exist no i and j such that c_i = c_j. (all duplicated elements between A and B are removed).

I hope this question makes sense and that I'm not asking something either terribly obvious, or something for which there is no solution.

Context:

A constructible number can be represented exactly in finitely many quadratic extensions to the field of rational numbers (using a binary tree of height equal to the number of field extensions). A representation of a constructible number must therefore "know" the field it is represented in. Lists A and B represent successive quadratic extensions of the rational numbers. Elements of A and B themselves are constructible numbers, which are defined in the context of previous smaller fields (hence the precedence operator). When adding/multiplying constructible numbers, the quadratically extended fields must first be merged so that the binary arithmetic operations can be performed; the resulting list C is the quadratically extended field which can represent numbers representable by both fields A and B. (If anyone has a better idea of how to programmatically work with constructible numbers, let me know. A question concerning constructible numbers has arisen before, and also here are some interesting responses about their representation.)

Before anyone asks, no, this question does not belong on mathoverflow; they hate algorithm (and generally non-graduate-level math) questions.

In practice, lists A and B are linked lists (stored in reverse order, actually). I will also need to keep track of which elements of C corresponded to which in A and B, but that is a minor detail. The algorithm I seek is not the merge operation in mergesort, because the precedence operator is not defined between elements of the two lists being merged. Everything will eventually be implemented in C++ (I just want the operator overloading). This is not homework, and will eventually be open sourced, FWIW.

like image 670
Victor Liu Avatar asked Jan 25 '10 00:01

Victor Liu


People also ask

How many comparisons are needed to merge the two lists?

Explanation: To merge two lists of size m and n, we need to do m+n-1 comparisons in worst case.

Which method is used to merge two lists?

The addAll() method to merge two lists The addAll() method is the simplest and most common way to merge two lists.

What is the time complexity of merging two lists?

Time Complexity: O(N+M) where N and M are the sizes of linked lists. Space Complexity: O(N+M), creating dummy nodes.


2 Answers

I don't think you can do it better than O(N*M), although I'd be happy to be wrong.

That being the case, I'd do this:

  • Take the first (remaining) element of A.
  • Look for it in (what's left of) B.
    • If you don't find it in B, move it to the output
    • If you do find it in B, move everything from B up to and including the match, and drop the copy from A.
  • Repeat the above until A is empty
  • Move anything left in B to the output

If you want to detect incompatible orderings of A and B, then remove "(what's left of)" from step 2. Search the whole of B, and raise an error if you find it "too early".

The problem is that given a general element of A, there is no way to look for it in B in better than linear time (in the size of B), because all we have is an equality test. But clearly we need to find the matches somehow and (this is where I wave my hands a bit, I can't immediately prove it) therefore we have to check each element of A for containment in B. We can avoid a bunch of comparisons because the orders of the two sets are consistent (at least, I assume they are, and if not there's no solution).

So, in the worst case the intersection of the lists is empty, and no elements of A are order-comparable with any elements of B. This requires N*M equality tests to establish, hence the worst-case bound.

For your example problem A = (1, 2, c, 4, 5, f), B = (a, b, c, d, e, f), this gives the result (1,2,a,b,c,4,5,d,e,f), which seems good to me. It performs 24 equality tests in the process (unless I can't count): 6 + 6 + 3 + 3 + 3 + 3. Merging with A and B the other way around would yield (a,b,1,2,c,d,e,4,5,f), in this case with the same number of comparisons, since the matching elements just so happen to be at equal indices in the two lists.

As can be seen from the example, the operation can't be repeated. merge(A,B) results in a list with an order inconsistent with that of merge(B,A). Hence merge((merge(A,B),merge(B,A)) is undefined. In general, the output of a merge is arbitrary, and if you go around using arbitrary orders as the basis of new complete orders, you will generate mutually incompatible orders.

like image 194
Steve Jessop Avatar answered Oct 19 '22 20:10

Steve Jessop


This sounds like it would use a degenerate form of topological sorting.

EDIT 2:

Now with a combined routine:

import itertools

list1 = [1, 2, 'c', 4, 5, 'f', 7]
list2 = ['a', 'b', 'c', 'd', 'e', 'f', 'g']

ibase = 0
result = []

for n1, i1 in enumerate(list1):
  for n2, i2 in enumerate(itertools.islice(list2, ibase, None, 1)):
    if i1 == i2:
      result.extend(itertools.islice(list2, ibase, ibase + n2))
      result.append(i2)
      ibase = n2 + ibase + 1
      break
  else:
    result.append(i1)
result.extend(itertools.islice(list2, ibase, None, 1))

print result
like image 42
Ignacio Vazquez-Abrams Avatar answered Oct 19 '22 22:10

Ignacio Vazquez-Abrams