Give a data structure that stores comparable objects and supports add()
and get(k)
operations [get(k)
returns the kth smallest element in the data structure (1 <= k <= n)
]. get(k)
must be O(1)
and add()
must be O(log n)
where n is the number of objects added to the data structure. Give another structure where get(k)
is O(log n)
and add is O(1)
If we sort the array in ascending order, the kth element of an array will be the kth smallest element. To find the kth largest element, we can pass k= length(Array) – k.
It can be clearly observed that Kth largest element is the same as (N – K)th smallest element, where N is the size of the given array. Therefore, we can apply the Kth smallest approach to this problem.
Suppose we have an unsorted array, we have to find the kth largest element from that array. So if the array is [3,2,1,5,6,4] and k = 2, then the result will be 5. We will sort the element, if the k is 1, then return last element, otherwise return array[n – k], where n is the size of the array.
The best data structure to keep track of top K elements is Heap. If we iterate through the array one element at a time and keep kth largest element in a heap such that each time we find a larger number than the smallest number in the heap, we do two things: Take out the smallest number from the heap.
If I got this interview question I would respond by saying that I am unaware of any such data structures, and suspect that they don't exist. However I suspect that the data structures that the interviewer is thinking of are "sorted array" and "skip list" respectively.
I would then explain that retrieving any element of an array by position is O(1). Figuring out where to insert it is O(log(n)). However the actual insertion is O(n) due to having to move the rest of the array. However the O(n) piece comes with very good constants.
For the skip list, retrieving is O(log(n)). Inserting involves half of the time only modifying one element, 1/4 of the time editing 2, 1/8 of the time editing 3 and so on, which is an average of 2 elements. That's O(1). However you cannot insert an element without figuring where to put it. And that lookup is O(log(n)). (To make the insert truly O(1) you either need to collect O(log(n)) data on the lookup that you make available to the insert, or you need to create the moral equivalent of a doubly linked skip list.)
There's no way to build a deterministic comparison-based data structure with amortized O(1)-time adds and worst-case O(log n)-time gets. The other configuration cannot be ruled out by an information-theoretic lower bound, but I seriously doubt that anyone knows how to do it.
For the experts: the adversary first adds n items, answering the algorithm's O(n) comparisons in such a way as to leave an antichain of size at least log2 n. It then chooses k in such a way that computing get(k) forces the algorithm to do selection on the antichain, incurring a cost of Ω(log2 n).
Why can the adversary force such a large antichain? Suppose that the algorithm always left no antichain of more than log2 n elements. By Dilworth's theorem, the elements can be partitioned into at most log2 n chains, which can be merged using O(n log log n) comparisons, giving a sorting algorithm that uses o(n log n) comparisons and thus a contradiction.
What could your interviewer have meant? It's conceivable to me that if both operations are amortized, then there's a solution. This is a non-canonical relaxation of the requirements, however.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With