Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correctness of greedy algorithm

In non-decreasing sequence of (positive) integers two elements can be removed when enter image description here. How many pairs can be removed at most from this sequence?

So I have thought of the following solution:

  • I take given sequence and divide into two parts (first and second).
  • Assign to each of them iterator - it_first := 0 and it_second := 0, respectively. count := 0
  • when it_second != second.length
    • if 2 * first[it_first] <= second[it_second]
      • count++, it_first++, it_second++
    • else
      • it_second++
  • count is the answer

Example:

count := 0

[1,5,8,10,12,13,15,24] --> first := [1,5,8,10], second := [12,13,15,24]

  1. 2 * 1 ?< 12 --> true, count++, it_first++ and it_second++

  2. 2 * 5 ?< 13 --> true, count++, it_first++ and it_second++

  3. 2 * 8 ?< 15 --> false, it_second++

  4. 8 ?<24 --> true, count ++it_second reach the last element - END.

  5. count == 3

Linear complexity (the worst case when there are no such elements to be removed. n/2 elements compare with n/2 elements). So my missing part is 'correctness' of algorithm - I've read about greedy algorithms proof - but mostly with trees and I cannot find analogy. Any help would be appreciated. Thanks!

EDIT: By correctness I mean: * It works * It cannot be done faster(in logn or constant)

I would like to put some graphics but due to reputation points < 10 - I can't. (I've meant one latex at the beginning ;))

like image 831
lemoid Avatar asked Mar 11 '15 21:03

lemoid


2 Answers

  1. Correctness:

    • Let's assume that the maximum number of pairs that can be removed is k. Claim: there is an optimal solution where the first elements of all pairs are k smallest elements of the array. Proof: I will show that it is possible to transform any solution into the one that contains the first k elements as the first elements of all pairs.

      1. Let's assume that we have two pairs (a, b), (c, d) such that a <= b <= c <= d, 2 * a <= b and 2 * c <= d. In this case, pairs (a, c) and (b, d) are valid, too. And now we have a <= c <= b <= d. Thus, we can always transform out pairs in such a way that the first element from any pair is not greater than the second element of any pair.

      2. When we have this property, we can simply substitute the smallest element among all first all elements of all pairs with the smallest element in the array, the second smallest among all first elements - with the second smallest element in the array and so on without invalidating any pair.

    • Now we know that there is an optimal solution that contains k smallest elements. It is clear that we cannot make the answer worse by taking the smallest unused element(making it bigger can only reduce the answer for the next elements) which fits each of them. Thus, this solution is correct.

    • A note about the case when the length of the array is odd: it doesn't matter where the middle element goes: to the first or to the second half. In the first half it is useless(there are not enough elements in the second half). If we put it to the second half, it is useless two(let's assume that we took it. It means that there is "free space" somewhere in the second half. Thus, we can shift some elements by one and get rid of it).

  2. Optimality in terms of time complexity: the time complexity of this solution is O(n). We cannot find the answer without reading the entire input in the worst case and reading is already O(n) time. Thus, this algorithm is optimal.

like image 163
kraskevich Avatar answered Sep 23 '22 09:09

kraskevich


Presuming your method. Indices are 0-based.

Denote in general:

  • end_1 = floor(N/2) boundary (inclusive) of first part.

Denote while iterating:

  • i index in first part, j index in second part,
  • optimal solution until this point sol(i,j) (using algorithm from front),
  • pairs that remain to be paired-up optimally behind (i,j) point i.e. from (i+1,j+1) onward rem(i,j) (can be calculated using algorithm from back),
  • final optimal solution can be expressed as the function of any point as sol(i,j) + rem(i,j).

Observation #1: when doing algorithm from front all points in [0, i] range are used, some points from [end_1+1, j] range are not used (we skip a(j) not large engough). When doing algorithm from back some [i+1, end_1] points are not used, and all [j+1, N] points are used (we skip a(i) not small enough).

Observation #2: rem(i,j) >= rem(i,j+1), because rem(i,j) = rem(i,j+1) + M, where M can be 0 or 1 depending on whether we can pair up a(j) with some unused element from [i+1, end_1] range.

Argument (by contradiction): let's assume 2*a(i) <= a(j) and that not pairing up a(i) and a(j) gives at least as good final solution. By the algorithm we would next try to pair up a(i) and a(j+1). Since:

  • rem(i,j) >= rem(i,j+1) (see above),
  • sol(i,j+1) = sol(i,j) (since we didn't pair up a(i) and a(j))

we get that sol(i,j) + rem(i,j) >= sol(i,j+1) + rem(i,j+1) which contradicts the assumption.

like image 26
plesiv Avatar answered Sep 19 '22 09:09

plesiv