Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Basic algorithm proof

I have a sequence of numbers, for example: 170, 205, 225, 190, 260, 130, 225, 160 and I have to split them into sets with fixed number of elements so that the maximal difference between elements of the sets is minimized.

I have a guarantee that if I need to split the elements into sets of K elements, then the global number of elements equals Z * K.

For the example with K = 4, the optimal splitting can be perfomed this way:

(1) : 130 160 170 190 (max difference equals 60)

(2) : 205 225 225 260 (max difference equals 55)

So, the global max difference for this case equals 60.


Now, the question: is my assumption that I can simply sort the initial data and split it into even parts starting from the beginning? If this is correct, how can I prove it? And if it's not, which approach should I use to solve this problem?

like image 618
Yippie-Ki-Yay Avatar asked Nov 01 '11 06:11

Yippie-Ki-Yay


1 Answers

Assuming your amount of numbers can always be divided exactly by K (so not 13 numbers in sets of 4), it is correct.

Through sorting, you get the most similar numbers as close to each other as possible, obviously. The question is, if numbers are moved to bring the worst placed value in a set with closer values, does that make the maximum difference smaller?

The answer is no. When sorted, the only values at the left of the of number are equal or lower, the number that would move to the left would be surrounded by lower values. Of the two numbers that caused the maximum difference, at least one would get an even worse partner, meaning your max distance would become higher. It works the same way with the higher numbers to the right.

Sorted:
[lowest, low, low, x] distance1 = x-lowest
[y, high, high, highest] distance2 = highest-y

Swapped:
[lowest, low, low, y] distance3 = y-lowest
[x, high, high, highest] distance4 = highest-x

Since x < y (let's assume they're not equal since swapping would be pointless), distance3 > distance1 and distance4 > distance2, meaning things got worse.

It works the same way if you place a higher value there.

No matter how off a number is, putting another number at that spot will make them more off.

Another option is to move the whole subset one space left:

[lowest, low, low, y]
[high, high, highest, x]

But this is actually the same result as swapping.

So that's how it works with 2 sets.

With three sets:

[lowest, low, low, x]
[lowM, lowM, highM, highM]
[highM, y, high, highest]

Swapping x and y is the same as earlier. Even if x is very similar or even equal to the bottom left high (if the middle lows and highs are actually equal), y is still higher than x, making the difference to lowest greater, and x is further away from highest.

Moving a bunch of numbers forwards:

[lowest, low, lowM, lowM]
[highM, highM, highM, y]
[x, highM, high, highest];

Perhaps the biggest difference was between highM and highest, and that distance is now removed. But since you can only move it away from highest by placing an even lower value there, you're always making it worse. The highest distance highest-highM is now highest-x, and x < highM.

It still works the other way round as well. Had there been a next set, highM could be swapped with a higher number closer to highest, but this would place highM with even higher numbers causing an even greater difference.

So yes, sorting the data and then splitting it in equal parts always gives you the minimal maximum difference, because changing the sorted sets always gives worse results.

Note: If the numbers are not dividable by K, it gets way more complicated, you'll have to figure out the worst set and see if you can move its highest or lowest number to a next or previous set without making another set a worse difference. The rule that you can only swap low numbers with high numbers is removed, since you can swap them with no-numbers, so proving things about that is a whole new level.

like image 51
Aberrant Avatar answered Oct 03 '22 03:10

Aberrant