Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding minimal distance between unsorted and sorted lists

Let A be a list and S a sorted list of the same elements. Assume all elements are different. How do I find a minimal set of "moves" (move X before Y (or end)) that turns A into S?

Examples:

A = [8,1,2,3]
S = [1,2,3,8]

A => S requires one move: 
   move 8 before end

A = [9,1,2,3,0]
S = [0,1,2,3,9]

A => S requires two moves:
   move 9 before 0
   move 0 before 1

I prefer javascript or python, but any language will do.

like image 243
georg Avatar asked Jan 30 '14 09:01

georg


People also ask

How do you find the minimum distance between elements in an array?

Suppose we have one unsorted array A, and two numbers x and y. We have to find the minimum distance between x and y in A. The array can also contain duplicate elements. So if the array is A = [2, 5, 3, 5, 4, 4, 2, 3], x = 3 and y = 2, then the minimum distance between 3 and 2 is just 1.

What are used for finding minimum distance between two places?

Answer: The great-circle distance are used for finding the minimum distance between any two places on the surface of the earth.

How do you find the shortest distance?

The distance between the lines is given by d = |(c2-c1)/√(1 + m2)|.


2 Answers

This problem is equivalent to longest increasing subsequence problem.

You will have to define a comparison operator less. less(a, b) will return true if and only if a is before b in the target sequence. Now using this comparison operator, compute the maximum increasing sub sequence of the source sequence. You will have to move each element that is not part of this sub sequence (otherwise the sub sequence will not be maximum) and you can move it exactly once(moving it to its target position).

EDIT: As requested by amit here is my proof to the statement above: Lets we denote the target sequence B and lets denote the source sequence A. Let n = |A| and let k be the length of the longest increasing sequence as described above.

  • Let's assume it is possible to reach B from A with less moves than n - k. This means that at least n - k + 1 elements from the A will not be moved. Let s1,s2,...sm be the set of elements that are not moved. From the assumption we know that m > k. Now as these elements have not moved, than their relative position with respect to each other can not have changed. Thus the relative positions of all this elements in the target sequence B is the same as the one in A. Therefor the operator less(si, sj) as defined above should be true for any i, j. But if this is true then s1,s2,...sm is increasing sequence and as m > k this leads to a contradiction with the assumption that k is the length of the longest increasing sequence.
  • Now lets show an algorithm to reach B from A by moving all elements but the ones that are part of the longest increasing sequence. We will move the elements in the order they appear in B. We will not move elements that are part of the longest increasing sequence. If the current element is the first one in B, we simply move it to the beginning of the sequence. Otherwise we move the current element right after the position of the previous element in B. Note that this element may either be the previous element we've moved or an element from the longest increasing sequence. Note that at each step when we are about to move element with index i, all elements with index 1, 2, ...i-1 will already be with correct relative positions with respect to each other.

EDIT: adding some code to make the answer clearer. I don't feel an expert in javascript so feel free to correct or criticize my solution.

Let's define a function transform(a, s) that takes two parameters - lists a and b as described in the statement. First I will create a map positions that maps each element in a to its position in s:

var positions = {};
for (var i = 0; i < a.length; ++i) {
  positions[a[i]] = i;
}

Now that I have this array I can define a helper function less as described in my answer above. Less will take two values a and b(and the helper map I just created) and return true if and only if a is before b in s(the target list):

function less(a, b, positions) {
  return positions[a] < positions[b];
}

Now I will not describe how can we find the maximum increasing subsequence in a with respect to that comparison operator. You can have a look at this question for detailed explanation how to do that. I will simply assume that I have a function defined:

function max_increasing_subsequence(a, positions)

That returns the maximum increasing subsequence in a with respect to the comparison operator less as defined above (using positions)as a list. I will use your second example to illustrate what we have so far:

A = [9,1,2,3,0]
S = [0,1,2,3,9]

The values in positions will be as follow:

positions = { 0 : 0,
              1 : 1,
              2 : 2,
              3 : 3,
              9 : 4}

And the result of max_increasing_subsequence(a, positions) will be [1, 2, 3]. By the way if there may be repeating elements in a it may be better to return indices instead of the elements from max_increasing_subsequence(in this particular example the difference will not be visible).

Now I will create another helper map to indicate which are the elements included in the maximum increasing subsequence:

var included = {};
l = max_increasing_subsequence(a, positions);
for (var i = 0; i < l.length; ++i) {
  included[l[i]] = true;
}

Now you can finish up the solution with a single iteration over s. I will add a special case for the last element to make code easier to understand:

if (!(s[s.length - 1] in included)) {
  console.log("Move" + s[s.length - 1] + " at the end");
}
for (var i = s.length - 2; i >= 0; --i) {
  if (!(s[i] in included)) {
    console.log("Move" + s[i] + " before " + s[i + 1]);
  }
}

Please note that in the solution above I assume that each time you log a new command, you log it with respect to the ordering of the array a right after all previous commands have been executed.

So in total I believe transform should look something like this:

function transform(a, s) {
  var positions = {};
  for (var i = 0; i < a.length; ++i) {
    positions[a[i]] = i;
  }
  var included = {};
  l = max_increasing_subsequence(a, positions);
  var included = {};
  for (var i = 0; i < l.length; ++i) {
    included[l[i]] = true;
  }
  if (!(s[s.length - 1] in included)) {
    console.log("Move" + s[s.length - 1] + " at the end");
  }
  for (var i = s.length - 2; i >= 0; --i) { // note s.length - 2 - don't process last element
    if (!(s[i] in included)) {
      console.log("Move" + s[i] + " before " + s[i + 1]);
    }
  }
}

I hope this code makes my answer more clear.

like image 198
Ivaylo Strandjev Avatar answered Nov 08 '22 03:11

Ivaylo Strandjev


If you regard your two lists as two strings -- e.g. the numbers are values in ASCII encoding -- then the problem is equivalent to that of finding the operations that allow you to transform the first string into the second one. The number of operations, in turn, is the Levenshtein or edit distance between the strings.

The Levenshtein distance can be found by using dynamic programming, storing in a matrix the distances between all prefixes of both strings, and then tracing back your steps to find at each row of the matrix which is the optimal operation (the one that has needed the least operations to arrive at it).

The longest increasing subsequence algorithm suggested by @IvayloStrandjev is related to the longest common subsequence problem, which in turn is related to the edit distance as an alternative metric that only allows insertion and substitution. Probably it is more performant in space, since it leverages the fact that one of the sequences has to be sorted; I just wanted to provide an alternative answer that I find easier to grasp.

Here is an implementation in Python of the full matrix Levenshtein algorithm, as described in the Wikipedia page linked above (originally found in a 1974 paper by Wagner and Fischer), where also a proof of correctness is supplied. Here we also store the names of the operations in a matrix of the same size as the operations scores, and we print the optimal operation after completing a row.

import argparse

import numpy as np


class Levenshtein(object):
    def __init__(self, string1, string2):
        self.string1 = string1
        self.string2 = string2
        self.scores_matrix = np.zeros(
            (len(self.string1) + 1, len(self.string2) + 1), dtype=np.int16)
        self.operations_matrix = np.empty_like(
            self.scores_matrix, dtype=(np.str_, 16))
        self.total_steps = 0

    def distance(self):
        m = len(self.string1) + 1
        n = len(self.string2) + 1
        for i in range(m):
            self.scores_matrix[i, 0] = i
        for j in range(n):
            self.scores_matrix[0, j] = j
        for j in range(1, n):
            for i in range(1, m):
                if self.string1[i - 1] == self.string2[j - 1]:
                    self.scores_matrix[i, j] = self.scores_matrix[i - 1, j - 1]
                    self.operations_matrix[i, j] = 'match'
                else:
                    self.scores_matrix[i, j] = self.select_operation(i, j)
                if j == n - 1:  # a row is complete
                    self.determine_best_op_and_print(i)
        return self.scores_matrix[m - 1, n - 1]

    def select_operation(self, i, j):
        possible_ops = ['delete', 'insert', 'substitute']
        ops_scores = [
            self.scores_matrix[i - 1, j] + 1,  # deletion
            self.scores_matrix[i, j - 1] + 1,  # insertion
            self.scores_matrix[i - 1, j - 1] + 1]  # substitution
        chosen_op = min(ops_scores)
        chosen_op_name = possible_ops[ops_scores.index(chosen_op)]
        self.operations_matrix[i, j] = chosen_op_name
        return chosen_op

    def determine_best_op_and_print(self, i):
        reversed_row = self.scores_matrix[i][::-1]
        reversed_pos_min = np.argmin(reversed_row)
        pos_min = len(self.scores_matrix[i]) - (reversed_pos_min + 1)
        best_op_name = self.operations_matrix[i, pos_min]
        if best_op_name != 'match':
            self.total_steps += 1
            print best_op_name, self.string1[i - 1], self.string2[pos_min - 1]


def parse_cli():
    parser = argparse.ArgumentParser()
    parser.add_argument('--list', nargs='*', required=True)
    return parser.parse_args()

if __name__ == '__main__':
    args = parse_cli()
    A = args.list
    S = sorted(A)
    lev = Levenshtein(A, S)
    dist = lev.distance()
    print "{} total steps were needed; edit distance is {}".format(
        lev.total_steps, dist)

Here is how to run the code with the examples you provide, and the output expected:

$ python levenshtein.py --list 8 1 2 3
substitute 8 1
1 total steps were needed; edit distance is 2

$ python levenshtein.py --list 9 1 2 3 0
substitute 9 0
substitute 0 9
2 total steps were needed; edit distance is 2
like image 34
logc Avatar answered Nov 08 '22 04:11

logc