Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the kth quantiles of an n-elements set. (From cormen)

Tags:

algorithm

The kth quantiles of an n-element set are the k - 1 order statistics that divide the sorted set into k equal-sized sets (to within 1). Give an O(n lg k)-time algorithm to list the kth quantiles of a set.

The straight forward solution would be to select every k, 2k, 3k .. ik the smallest element, whose running time is O(kn) (k calls to select procedure of O(n)). But this can be optimized to do better than O(kn). After finding the median of medians at the index 'i' in the select procedure, we make the following recursive call.

if the index of the median of medians i is > k, recursively call select the kth smallest element in the left subarray A[0 ... i]

if the i is < k, recursively select the n - i + k th smallest element in the right subarray A[i+1 ... n].

Can the above recursive calls be modified as below which would reduce the factor 'k' to 'log k' ?

if the index of the median of medians i is > k, recursively select the kth smallest element in the left subarray A[0 ... i] and also recursively select the n - k th smallest element in the right subarray A[i+1 ... n].

if the i is < k, recursively select the n - i + k th smallest element in the right subarray A[i+1 ... n] and also recursively select k th smallest element in the left subarray A[0 ... i].

The main call would be just select(A, k, n).

like image 403
user472402 Avatar asked Oct 11 '10 14:10

user472402


3 Answers

Note that we use a modified PARTITION that is given an index to the pivot to be used as its last input parameter.

You start with KTH-QUANTILES(A, 1, n, 1, k-1, k)

KTH-QUANTILES(A, p, r, i, j, k)
n=A.length
m=floor((i+j)/2)
q=floor(m(n/k))
q=q-p+1
q=SELECT(A, p, r, q)
q=PARTITION(A, p, r, q)
if i<m
    L=KTH-QUANTILES(A, p, q-1, i, m-1, k)
if m<j
    R=KTH-QUANTILES(A, q+1, r, m+1, j, k)
return L U A[q] U R

The depth of the recursion tree is lg k, since the partition is done around the median of the given order statistics (from i to j).

On each level of the recursion tree there are Θ(n) operations, so the running time is Θ(nlgk).

like image 150
Avi Cohen Avatar answered Nov 18 '22 04:11

Avi Cohen


I haven't gone through your approach, but this is a question from Int. to Algorithms by Cormen. Anyhow I was browsing for a solution myself, and I would love to share my version of the algorithm. Try to refute it's correctness:

I will assume we have a O(n) statistic finding algorithm. So I can find k-th statistic in O(n) time. Suppose I say that I will find all n/k k-th quantiles using the divide-and-conquer such that:

If I have n' elements, I split the array into n'/2 parts, report the boundary k-th quantiles for both n'/2 partitions. And report the remaining quantiles recursively. In essence what I am doing is, after partitioning using median, I will extract the rightmost quantile from left array, leftmost quantile from right partition and after trimming these arrays run the algorithm recursively. My complexity analysis comes out to be:

T(n,k) = 2*T(n/2,k/2) + O(n).

This turns out to be O(nlogk) as the k/2 part will converge faster, though you might want to solve that more rigorously. Also we have used that n>k( obvious from the problem. Note that the task of extracting 2 quantiles and trimming the array will be done in O(n)

like image 2
sanket Avatar answered Nov 18 '22 03:11

sanket


Without loss of generality, assume that n and k are powers of 2.

We first find the n/2th order statistic, in time O(n) using SELECT, then reduce the problem to finding the k/2th quantiles of the smaller n/2 elements and the k/2th quantiles of the larger n/2 elements.

Let T(n) denote the time it takes the algorithm to run on input of size n.
Then T(n) = cn + 2T(n/2) for some constant c, and the base case is T(n/k) = O(1).

Then we have: T(n) ≤ cn + 2T(n/2) ≤ 2cn + 4T(n/4) ≤ 3cn + 8T(n/8) . . . ≤ log(k)cn + kT(n/k) ≤ log(k)cn + O(k) = O(nlogk).

cc. Solution Manual

like image 1
Omid Halimi Milani Avatar answered Nov 18 '22 02:11

Omid Halimi Milani