Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maximizing difference between numbers in a sequence

I need some help in finding the general idea for an algorithm to solve the following problem. The problem has been given to me in an assignment. It looks like it should be solvable through a greedy method, but I can't figure out a simple solution. Here's the problem description:

You are given a sequence of N numbers a_1 ... a_n such that 0 = a_1 < a_2 < ... < a_n. You must eliminate at most M of these numbers such that the minimum difference a_i+1 - a_i between any two consecutive numbers is maximized.

You may not eliminate the first and last elements, a_0 and a_n. Also you must eliminate as few numbers as possible: if eliminating M - 1 you get the shortest distance to be D and eliminating M you still have the same minimum difference, you must not eliminate this last number.

I'm not asking for a complete solution to this problem. Only a few guidelines as to how the algorithm might look.

Edit: Some test samples. Keep in mind that there may be multiple valid solutions.

Remove at most 7 from:
0 3 7 10 15 18 26 31 38 44 53 60 61 73 76 80 81 88 93 100

Solution:
0 7 15 26 31 38 44 53 60 73 80 88 93 100
Remove at most 8 from:
0 3 7 10 15 26 38 44 53 61 76 80 88 93 100

Solution:
0 15 38 53 76 88 100
like image 765
pulagroasa Avatar asked Mar 14 '13 20:03

pulagroasa


1 Answers

[EDIT: I originally claimed that ElKamina's answer was wrong, but I've now convinced myself that not only is it correct, it's virtually the same as my (later) answer :-P Still a bit terse for my taste though!]

Here's an exact O(NM^2)-time, O(NM)-space dynamic programming algorithm that gets the right answer on all the OP's examples in milliseconds. The basic ideas are that:

  1. Whenever we impose the constraint that a particular number should not be deleted, it forms a "fence" between two subproblems in such a way that solving each subproblem optimally guarantees an optimal solution to the overall problem with that constraint in place, and
  2. Every optimal solution must begin with a non-deleted number, followed by some number of consecutive deleted numbers, followed by a non-deleted number, followed by an optimal solution to the remainder of the problem that begins at the second non-deleted number and uses an appropriately reduced M.

In the following, x[i] means the ith number in the list, with indexing starting from 0.

The recursion

Let f(i, j) be the optimal (largest minimum) interval size obtainable from the suffix of the number list starting at position 0 <= i < N by keeping this (i.e. the ith) number and deleting exactly j of the later (not necessarily consecutive) numbers. The following recursion shows how this can be computed:

f(i, j) = max(g(i, j, d)) over all 0 <= d <= min(j, N-i-2)
g(i, j, d) = min(x[i+d+1] - x[i], f(i+d+1, j-d))

The min(j, N-i-2) is there instead of just plain j to prevent deletions "past the end". The only base cases we need are:

f(N-1, 0) = infinity (this has the effect of making min(f(N-1), 0), z) = z)
f(N-1, j > 0) = 0 (this case only arises if M > N - 2)

How it works

In more detail, to calculate f(i, j), what we do is loop over all possible numbers (including zero) of consecutive deletions starting at position i+1, in each case calculating the minimum of (a) the interval formed by this block of deletions and (b) the optimal solution to the subproblem beginning on the first undeleted number to the right of this block. It's important to specify that the first number in a block (x[i]) is not deleted, so that the previous (parent) subproblem's interval is always "capped". This is a tricky part that took me a while to figure out.

Making it fast

If you code up the plain recursion above it will work, but it will take time exponential in M and N. By memoising f(), we guarantee that its body will be run at most N * M times (once per distinct parameter combination). Each time the function runs it performs O(M) work scanning through increasingly long blocks of deletions, for O(NM^2) time overall.

You cannot create a larger gap by using fewer deletions, so the overall largest minimum interval size can be found by looking through the M+1 results f(0, M), f(0, M-1), ..., f(0, 0) for the first number that is smaller than a previous number: that previous number is the answer, and the second argument to f() is the minimum number of deletions required. To find an optimal solution (i.e. a list of the particular numbers deleted), you can record decisions made in a separate predecessor array, so that p[i, j] gives the value of d (which can be turned into the previous values of i and j) that led to the the optimal solution for f(i, j). (Perhaps "predecessor" is confusing here: it refers to the subproblems that are solved before the current subproblem, although these subproblems appear "after" (to the right of) the suffix representing the current subproblem.) These links can then be followed to recover the delete/don't-delete decisions made.

Working C++ code

http://ideone.com/PKfhDv

Addendum: Early missteps

With a tricky problem like this, it can be helpful to look at wrong approaches and see exactly where they went wrong... :-/ I thought I'd solved the problem, but I hadn't noticed the requirement to return a solution that uses as few deletions as possible, and my initial attempts to fix this didn't work.

At first I tried defining f(i, j) to be the optimal (largest minimum) interval size obtainable from the suffix of the number list starting at position 0 <= i < N by keeping this (i.e. the ith) number and deleting anywhere from 0 up to j of the later (not necessarily consecutive) numbers. But this caused a subtle problem: it's not necessarily the case that you can assemble an optimal solution to this from optimal solutions to subproblems. I initially thought this could be fixed by changing the function to return an (interval size, minimum number of deletions needed to achieve that interval size) pair instead of just an interval size, and having it break ties between actions that share the maximum minimum interval size by always choosing the action that minimises the number of deletions. But this is not true in general, because the optimal solution to the subproblem (i.e. to some suffix of the number list) will spend deletions making the minimum interval size in that region as large as possible, even if these deletions turn out to be wasted because the prefix of the full solution will force the overall minimum to be lower anyway. Here's a counterexample using an f() that returns (interval size, minimum number of deletions needed to achieve that size) pairs:

Problem: M = 1, X = [10 15 50 55].

f(2, 0) = (5, 0) (leaving [50 55])
f(1, 1) = (40, 1) (delete 50 to leave [15 55]); *locally* this appears better
          than not deleting anything, which would leave [15 50 55] and yield
          a min-gap of 5, even though the latter would be a better choice for
          the overall problem)
f(0, 1) = max(min(5, f(1, 1)), min(40, f(2, 0))
        = max(min(5, 40), min(40, 5))
        = (5, 1) (leaving either [10 15 55] or [10 50 55])

I haven't shown the working for the second element of the pair returned by f(0, 1) because it's hard to express concisely, but obviously it will be 1 because both alternatives tried need 1 deletion.

like image 97
j_random_hacker Avatar answered Sep 28 '22 01:09

j_random_hacker