Given an array a
and integer k
. Someone uses following algorithm to get first k smallest elements:
cnt = 0
for i in [1, k]:
for j in [i + 1, n]:
if a[i] > a[j]:
swap(a[i], a[j])
cnt = cnt + 1
The problem is: How to calculate value of cnt
(when we get final k-sorted array), i.e. the number of swaps, in O(n log n)
or better ?
Or simply put: calculate the number of swaps needed to get first k-smallest number sorted using the above algorithm, in less than O(n log n)
.
I am thinking about a binary search tree, but I get confused (How array will change when increase i ? How to calculate number of swap for a fixed i ?...).
In bubble sort, Number of swaps required = Number of inversion pairs. When an array is sorted in descending order, the number of inversion pairs = n(n-1)/2 which is maximum for any permutation of array.
Worst Case: Bubble sort worst case swaps n(n-1)/2. The smallest element is at end of the list so it needs the n(n-1)/2 swaps are required.
Explanation: Even though the first two elements are already sorted, bubble sort needs 4 iterations to sort the given array.
Three passes will be required; First Pass. ( 5 1 4 2 8 ) → ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and swaps since 5 > 1. ( 1 5 4 2 8 ) → ( 1 4 5 2 8 ), Swap since 5 > 4.
This is a very good question: it involves Inverse Pairs, Stack and some proof techniques.
Note 1: All index used below are 1-based, instead of traditional 0-based.
Note 2: If you want to see the algorithm directly, please start reading from the bottom.
First we define Inverse Pairs as:
For
a[i]
anda[j]
, in whichi < j
holds, if we havea[i] > a[j]
, thena[i]
anda[j]
are called an Inverse Pair.
For example, In the following array:
3 2 1 5 4
a[1]
and a[2]
is a pair of Inverse Pair, a[2]
and a[3]
is another pair.
Before we start the analysis, let's define a common language: in the reset of the post, "inverse pair starting from i
" means the total number of inverse pairs involving a[i]
.
For example, for a = {3, 1, 2}
, inverse pair starting from 1 is 2, and inverse pair starting from 2 is 0.
Now let's look at some facts:
i < j < k
, and a[i] > a[k]
, a[j] > a[k]
, swap a[i]
and a[j]
(if they are an inverse pair) won't affect the total number of inverse pair starting from j
;a = {5, 3, 4}
, before a[1]
is swapped with a[2]
, total number of inverse pair starting from 1 is 2, but after swap, array becomes a = {3, 5, 4}
, and the number of inverse pair starting from 1 becomes 1);A
and 2 numbers, a
and b
, as the head element of A
, if we can form more inverse pair with a
than b
, we have a > b
;i
as ip[i]
, then we have: if k
is the min number satisfies ip[i] > ip[i + k]
, then a[i] > a[i + k]
while a[i] < a[i + 1 .. i + k - 1]
must be true. In words, if ip[i + k]
is the first number smaller than ip[i]
, a[i + k]
is also the first number smaller than a[i]
;Proof of point 1:
By definition of inverse pair, for all a[k]
, k > j
that forms inverse pair with a[j]
, a[k] < a[j]
must hold. Since a[i]
and a[j]
are a pair of inverse and provided that i < j
, we have a[i] > a[j]
. Therefore, we have a[i] > a[j] > a[k]
, which indicates the inverse-pair-relationships are not broken.
Proof of point 3:
Leave as empty since quite obvious.
Proof of point 4:
First, it's easy to see that when i < j
, a[i] > a[j]
, we have ip[i] >= ip[j] + 1 > ip[j]
. Then, it's inverse-contradict statement is also true, i.e. when i < j
, ip[i] <= ip[j]
, we have a[i] <= a[j]
.
Now back to the point. Since k is the min number to satisfy ip[i] > ip[i + k]
, then we have ip[i] <= ip[i + 1 .. i + k - 1]
, which indicates a[i] <= a[i + 1.. i + k - 1]
by the lemma we just proved, which also indicates there's no inverse pairs in the region [i + 1, i + k - 1]
. Therefore, ip[i]
is the same as the number of inverse pairs starting from i + k
, but involving a[i]
. Given ip[i + k] < ip[i]
, we know a[i + k]
has less inverse pair than a[i]
in the region of [i + k + 1, n]
, which indicates a[i + k] < a[i]
(by Point 3).
You can write down some sequences and try out the 4 facts mentioned above and convince yourself or disprove them :P
Now it's about the algorithm.
A naive implementation will take O(nk)
to compute the result, and the worst case will be O(n^2)
when k = n
.
But how about we make use of the facts above:
First we compute ip[i]
using Fenwick Tree (see Note 1 below), which takes O(n log n)
to construct and O(n log n)
to get all ip[i]
calculated.
Next, we need to make use of facts. Since swap of 2 numbers only affect current position's inverse pair number but not values after (point 1 and 2), we don't need to worry about the value change. Also, since the nearest smaller number to the right shares the same index in ip
and a
, we only need to find the first ip[j]
that is smaller than ip[i]
in [i + 1, n]
. If we denote the number of swaps to get first i
element sorted as f[i]
, we have f[i] = f[j] + 1
.
But how to find this "first smaller number" fast? Use stack! Here is a post which asks a highly similar problem: Given an array A,compute B s.t B[i] stores the nearest element to the left of A[i] which is smaller than A[i]
In short, we are able to do this in O(n)
.
But wait, the post says "to the left" but in our case it's "to the right". The solution is simple: we do backward in our case, then everything the same :D
Therefore, in summary, the total time complexity of the algorithm is O(n log n) + O(n) = O(n log n)
.
Finally, let's talk with an example (a simplified example of @make_lover's example in the comment):
a = {2, 5, 3, 4, 1, 6}
, k = 2
First, let's get the inverse pairs:
ip = {1, 3, 1, 1, 0, 0}
To calculate f[i]
, we do backward (since we need to use the stack technique):
f[6] = 0, since it's the last one
f[5] = 0, since we could not find any number that is smaller than 0
f[4] = f[5] + 1 = 1, since ip[5] is the first smaller number to the right
f[3] = f[5] + 1 = 1, since ip[5] is the first smaller number to the right
f[2] = f[3] + 1 = 2, since ip[3] is the first smaller number to the right
f[1] = f[5] + 1 = 1, since ip[5] is the first smaller number to the right
Therefore, ans = f[1] + f[2] = 3
Note 1: Using Fenwick Tree (Binary Index Tree) to get inverse pair can be done in O(N log N), here is a post on this topic, please have a look :)
Update
Aug/20/2014: There was a critical error in my previous post (thanks to @make_lover), here is the latest update.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With