Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Queue data structure supporting fast k-th largest element finding

Tags:

I'm faced with a problem which requires a Queue data structure supporting fast k-th largest element finding.

The requirements of this data structure are as follows:

  1. The elements in the queue are not necessarily integers, but they must be comparable to each other, i.e we can tell which one is greater when we compare two elements(they can be equal as well).

  2. The data structure must support enqueue(adds the element at the tail) and dequeue(removes the element at the head).

  3. It can quickly find the k-th largest element in the queue, pls note k is not a constant.

  4. You can assume that operations enqueue , dequeue and k-th largest element finding all occur with the same frequency.

enter image description here

My idea is to use a modified balanced binary search tree. The tree is the same as ordinary balanced binary search tree except that every nodei is augmented with another field ni, ni denotes the number of nodes contained in the subtree with root nodei. The aforementioned operations are supported as follows:

For simplicity assume that all elements are distinct.

Enqueue(x): x is first inserted into the tree, suppose the corresponding node is nodet, we append pair(x,pointer to nodet) to the queue.

Dequeue: suppose (e1, node1) is the element at the head, node1 is the pointer into the tree corresponding to e1. We delete node1 from the tree and remove (e1, node1) from the queue.

K-th largest element finding: suppose root node is noderoot, its two children are nodeleft and noderight(suppose they all exist), we compare K with nroot , three cases may happen:

  1. if K< nleft we find the K-th largest element in the left subtree of nroot;

  2. if K>nroot-nright we find the (K-nroot+nright)-th largest element in the right subtree of nroot;

  3. otherwise nroot is the node we want.

The time complexity of all the three operations are O(logN) , where N is the number of elements currently in the queue.

How can I speed up the operations mentioned above? With what data structures and how?

like image 469
outlaw Avatar asked Sep 21 '12 14:09

outlaw


2 Answers

Note - you cannot achieve better then O(logn) for all, at best you need to "chose" which op you care for the most. (Otherwise, you could sort in O(n) by feeding the array to the DS, and querying 1st, 2nd, 3rd, ... nth elements)

  • Using a skip list instead of a Balanced BST as the sorted structure can reduce dequeue complexity to O(1) average case. It does not affect complexity of any other op.
    To remove from a skip list - all you need to do is to get to the element using the pointer from the head of the queue, and follow the links up and remove each. The expected number of nodes needed to be deleted is 1 + 1/2 + 1/4 + ... = 2.
  • find Kth can be achieved in O(logK) by starting from the leftest node (and not the root) and making your way up until you find you have "more sons then needed", and then treat the just found node as the root just like the algorithm in the question. Though it is better in asymptotic complexity - the constant factor is double.
like image 170
amit Avatar answered Nov 14 '22 23:11

amit


I found an interesting paper:

Sliding-Window Top-k Queries on Uncertain Streams published in VLDB 2008 and cited by 71.

https://www.cse.ust.hk/~yike/wtopk.pdf

VLDB is the best conference in database research area, and the number of citations proves the data structure actually works.

The paper looks pretty difficult, but if you really need improve your data structure, I suggest you to read this paper or papers in the reference page of this paper.

like image 24
Heejin Avatar answered Nov 14 '22 23:11

Heejin