This is an interesting question I have found on the web. Given an array containing <code>n</code> numbers (with no information about them), we should pre-process the array in linear time so that we can return the <code>k</code> smallest elements in <code>O(k)</code> time, when we are given a number <code>1 <= k <= n</code> I have been discussing this problem with some friends but no one could find a solution; any help would be appreciated!

For the pre-processing step, we will use the partition-based selection several times on the same data set. Find the n/2-th number with the algorithm.. now the dataset is partitioned into two half, lower and upper. On the lower half find again the middlepoint. On its lower partition do the same thing and so on... Overall this is O(n) + O(n/2) + O(n/4) + ... = O(n). Now when you have to return the k smallest elements, search for the nearest x < k, where x is a partition boundary. Everything below it can be returned, and from the next partition you have to return k - x numbers. Since the next partition's size is O(k), running another selection algorithm for the k - x th number will return the rest.

We can find the median of a list and partition around it in linear time. Then we can use the following algorithm: maintain a buffer of size <code>2k</code>. Every time the buffer gets full, we find the median and partition around it, keeping only the lowest <code>k</code> elements. This requires <code>n/k</code> find-median-and-partition steps, each of which take <code>O(k)</code> time with a traditional quickselect. this approach requires only <code>O(n)</code> time. Additionally if you need the sorted output. Which adds an additional <code>O(k log k)</code> time. In total, this approach requires only <code>O(n + k log k)</code> time and <code>O(k)</code> space.

Prepare array in linear time to find k smallest elements in O(k)

Q: What is the kth smallest element?

Definition of kth smallest element kth smallest element is the minimum possible n such that there are at least k elements in the array <= n.

Tags:

This is an interesting question I have found on the web. Given an array containing n numbers (with no information about them), we should pre-process the array in linear time so that we can return the k smallest elements in O(k) time, when we are given a number 1 <= k <= n

I have been discussing this problem with some friends but no one could find a solution; any help would be appreciated!

781

asked Jun 23 '13 13:06

Idan

2 Answers

For the pre-processing step, we will use the partition-based selection several times on the same data set.

Find the n/2-th number with the algorithm.. now the dataset is partitioned into two half, lower and upper. On the lower half find again the middlepoint. On its lower partition do the same thing and so on... Overall this is O(n) + O(n/2) + O(n/4) + ... = O(n).

Now when you have to return the k smallest elements, search for the nearest x < k, where x is a partition boundary. Everything below it can be returned, and from the next partition you have to return k - x numbers. Since the next partition's size is O(k), running another selection algorithm for the k - x th number will return the rest.

answered Oct 19 '22 05:10

Karoly Horvath

We can find the median of a list and partition around it in linear time.

Then we can use the following algorithm: maintain a buffer of size 2k.

Every time the buffer gets full, we find the median and partition around it, keeping only the lowest k elements.
This requires n/k find-median-and-partition steps, each of which take O(k) time with a traditional quickselect. this approach requires only O(n) time.

Additionally if you need the sorted output. Which adds an additional O(k log k) time. In total, this approach requires only O(n + k log k) time and O(k) space.

answered Oct 19 '22 07:10

Jayram

Related questions
                            
                                New desugaring behavior in Scala 2.10.1
                            
                                CLLocationManager geo-fencing/startMonitoringForRegion: vs. startMonitoringForSignificantLocationChanges: vs. 10-minute startUpdating calls
                            
                                How to automatically invoke a script before a git add?
                            
                                Is there a linked list predefined library in Python? [closed]
                            
                                Creating Observable from normal Java events
                            
                                Private class as return type from public method
                            
                                Tracking down mysterious high-priority thread suspend inside the kernel
                            
                                Set a default cache control and expires for entire S3 bucket/CloudFront
                            
                                Best machine-optimized polynomial minimax approximation to arctangent on [-1,1]?
                            
                                Separate sections in Python [closed]
                            
                                Why does Git need signed pushes?
                            
                                Narrowing conversion to bool in list-initialization - strange behaviour

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With