Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count number of swaps to sort first k-smallest element using a bubble sort like algorithm

Given an array a and integer k. Someone uses following algorithm to get first k smallest elements:

cnt = 0
for i in [1, k]:
    for j in [i + 1, n]:
        if a[i] > a[j]:
            swap(a[i], a[j])
            cnt = cnt + 1

The problem is: How to calculate value of cnt (when we get final k-sorted array), i.e. the number of swaps, in O(n log n) or better ?

Or simply put: calculate the number of swaps needed to get first k-smallest number sorted using the above algorithm, in less than O(n log n).

I am thinking about a binary search tree, but I get confused (How array will change when increase i ? How to calculate number of swap for a fixed i ?...).

like image 867
make_lover Avatar asked Aug 17 '14 07:08

make_lover


People also ask

How do you count the number of swaps in bubble sort?

In bubble sort, Number of swaps required = Number of inversion pairs. When an array is sorted in descending order, the number of inversion pairs = n(n-1)/2 which is maximum for any permutation of array.

What is the number of swaps required to sort an element using bubble sort in the worst case?

Worst Case: Bubble sort worst case swaps n(n-1)/2. The smallest element is at end of the list so it needs the n(n-1)/2 swaps are required.

How many swaps are required to sort the given array using bubble sort -{ 2 5 1 3 4?

Explanation: Even though the first two elements are already sorted, bubble sort needs 4 iterations to sort the given array.

How many passes and swaps are required in bubble sort?

Three passes will be required; First Pass. ( 5 1 4 2 8 ) → ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and swaps since 5 > 1. ( 1 5 4 2 8 ) → ( 1 4 5 2 8 ), Swap since 5 > 4.


1 Answers

This is a very good question: it involves Inverse Pairs, Stack and some proof techniques.

Note 1: All index used below are 1-based, instead of traditional 0-based.

Note 2: If you want to see the algorithm directly, please start reading from the bottom.

First we define Inverse Pairs as:

For a[i] and a[j], in which i < j holds, if we have a[i] > a[j], then a[i] and a[j] are called an Inverse Pair.

For example, In the following array:

3 2 1 5 4

a[1] and a[2] is a pair of Inverse Pair, a[2] and a[3] is another pair.

Before we start the analysis, let's define a common language: in the reset of the post, "inverse pair starting from i" means the total number of inverse pairs involving a[i].

For example, for a = {3, 1, 2}, inverse pair starting from 1 is 2, and inverse pair starting from 2 is 0.

Now let's look at some facts:

  1. If we have i < j < k, and a[i] > a[k], a[j] > a[k], swap a[i] and a[j] (if they are an inverse pair) won't affect the total number of inverse pair starting from j;
  2. Total inverse pairs starting from i may change after a swap (e.g. suppose we have a = {5, 3, 4}, before a[1] is swapped with a[2], total number of inverse pair starting from 1 is 2, but after swap, array becomes a = {3, 5, 4}, and the number of inverse pair starting from 1 becomes 1);
  3. Given an array A and 2 numbers, a and b, as the head element of A, if we can form more inverse pair with a than b, we have a > b;
  4. Let's denote the total number of inverse pair starting from i as ip[i], then we have: if k is the min number satisfies ip[i] > ip[i + k], then a[i] > a[i + k] while a[i] < a[i + 1 .. i + k - 1] must be true. In words, if ip[i + k] is the first number smaller than ip[i], a[i + k] is also the first number smaller than a[i];

Proof of point 1:

By definition of inverse pair, for all a[k], k > j that forms inverse pair with a[j], a[k] < a[j] must hold. Since a[i] and a[j] are a pair of inverse and provided that i < j, we have a[i] > a[j]. Therefore, we have a[i] > a[j] > a[k], which indicates the inverse-pair-relationships are not broken.

Proof of point 3:

Leave as empty since quite obvious.

Proof of point 4:

First, it's easy to see that when i < j, a[i] > a[j], we have ip[i] >= ip[j] + 1 > ip[j]. Then, it's inverse-contradict statement is also true, i.e. when i < j, ip[i] <= ip[j], we have a[i] <= a[j].

Now back to the point. Since k is the min number to satisfy ip[i] > ip[i + k], then we have ip[i] <= ip[i + 1 .. i + k - 1], which indicates a[i] <= a[i + 1.. i + k - 1] by the lemma we just proved, which also indicates there's no inverse pairs in the region [i + 1, i + k - 1]. Therefore, ip[i] is the same as the number of inverse pairs starting from i + k, but involving a[i]. Given ip[i + k] < ip[i], we know a[i + k] has less inverse pair than a[i] in the region of [i + k + 1, n], which indicates a[i + k] < a[i] (by Point 3).

You can write down some sequences and try out the 4 facts mentioned above and convince yourself or disprove them :P

Now it's about the algorithm.

A naive implementation will take O(nk) to compute the result, and the worst case will be O(n^2) when k = n.

But how about we make use of the facts above:

First we compute ip[i] using Fenwick Tree (see Note 1 below), which takes O(n log n) to construct and O(n log n) to get all ip[i] calculated.

Next, we need to make use of facts. Since swap of 2 numbers only affect current position's inverse pair number but not values after (point 1 and 2), we don't need to worry about the value change. Also, since the nearest smaller number to the right shares the same index in ip and a, we only need to find the first ip[j] that is smaller than ip[i] in [i + 1, n]. If we denote the number of swaps to get first i element sorted as f[i], we have f[i] = f[j] + 1.

But how to find this "first smaller number" fast? Use stack! Here is a post which asks a highly similar problem: Given an array A,compute B s.t B[i] stores the nearest element to the left of A[i] which is smaller than A[i]

In short, we are able to do this in O(n).

But wait, the post says "to the left" but in our case it's "to the right". The solution is simple: we do backward in our case, then everything the same :D

Therefore, in summary, the total time complexity of the algorithm is O(n log n) + O(n) = O(n log n).

Finally, let's talk with an example (a simplified example of @make_lover's example in the comment):

a = {2, 5, 3, 4, 1, 6}, k = 2

First, let's get the inverse pairs:

ip = {1, 3, 1, 1, 0, 0}

To calculate f[i], we do backward (since we need to use the stack technique):

f[6] = 0, since it's the last one
f[5] = 0, since we could not find any number that is smaller than 0
f[4] = f[5] + 1 = 1, since ip[5] is the first smaller number to the right
f[3] = f[5] + 1 = 1, since ip[5] is the first smaller number to the right
f[2] = f[3] + 1 = 2, since ip[3] is the first smaller number to the right
f[1] = f[5] + 1 = 1, since ip[5] is the first smaller number to the right

Therefore, ans = f[1] + f[2] = 3

Note 1: Using Fenwick Tree (Binary Index Tree) to get inverse pair can be done in O(N log N), here is a post on this topic, please have a look :)

Update

Aug/20/2014: There was a critical error in my previous post (thanks to @make_lover), here is the latest update.

like image 158
nevets Avatar answered Oct 14 '22 06:10

nevets