I have a running stream of integers, how can I take largest k elements from this stream at any point of time.

Easiest solution would be to populate a min-heap of size <code>k</code>. First, populate the heap with the first <code>k</code> elements. Next, for each element in the stream - check if it is larger than the heap's head, and if it is - pop the current head, and insert the new element instead. At any point during the stream - the heap contains the largest <code>k</code> elements. This algorithm is <code>O(nlogk)</code>, where <code>n</code> is the number of elements encountered so far in the stream. <hr> Another solution, a bit more complex but theoretically better in terms of asymptotic complexity in some cases, is to hold an array of <code>2k</code> elements. First, load the first 2k elements. Run Selection Algorithm, and find the highest <code>k</code> out of them. Discard the rest, at this point you have only <code>k</code> elements left in the array. Now, fill the array again with the next <code>k</code> elements, and repeat. At each point, the array contains the <code>k</code> largest elements, and up to <code>k</code> more elements that are not the largest. You can run Selection Algorithm for each query on this array. Run time analysis: Maintaining the array: Each selection algorithm is <code>O(2k) = O(k)</code>. This is done once every <code>k</code> elements, so <code>n/k</code> times if <code>n</code> indicates the number of elements seen so far, which gives us <code>O(n/k * 2k) = O(n)</code>. In addition, each query is <code>O(k)</code>, if the number of queries is <code>Q</code>, this gives us <code>O(n + Q*k)</code> run-time. In order to this solution to be more efficient, we need <code>Q*k < nlogk</code> <pre class="prettyprint"><code>Q*k < nlogk Q < n/k * logk </code></pre> So, if number of queries is limited as suggested above, this solution could be more efficient in terms of asymptotic complexity. <hr> In practice, getting top k is usually done by using the min-heap solution, at least where I've seen the need of it.

Optimal algorithm to return largest k elements from an array of infinite number of elements in running stream

1 Answers

Easiest solution would be to populate a min-heap of size k.

First, populate the heap with the first k elements.

Next, for each element in the stream - check if it is larger than the heap's head, and if it is - pop the current head, and insert the new element instead.

At any point during the stream - the heap contains the largest k elements.

This algorithm is O(nlogk), where n is the number of elements encountered so far in the stream.

Another solution, a bit more complex but theoretically better in terms of asymptotic complexity in some cases, is to hold an array of 2k elements.

First, load the first 2k elements.
Run Selection Algorithm, and find the highest k out of them. Discard the rest, at this point you have only k elements left in the array.
Now, fill the array again with the next k elements, and repeat.

At each point, the array contains the k largest elements, and up to k more elements that are not the largest. You can run Selection Algorithm for each query on this array.

Run time analysis:

Maintaining the array: Each selection algorithm is O(2k) = O(k). This is done once every k elements, so n/k times if n indicates the number of elements seen so far, which gives us O(n/k * 2k) = O(n).

In addition, each query is O(k), if the number of queries is Q, this gives us O(n + Q*k) run-time.

In order to this solution to be more efficient, we need Q*k < nlogk

Q*k < nlogk
Q < n/k * logk

So, if number of queries is limited as suggested above, this solution could be more efficient in terms of asymptotic complexity.

In practice, getting top k is usually done by using the min-heap solution, at least where I've seen the need of it.

answered Oct 19 '22 02:10

amit

Related questions
                            
                                How do I find the length of an associative array in ActionScript 3.0?
                            
                                trying to copy struct members to byte array in c
                            
                                Strange behaviour of the Array type with `==` operator
                            
                                Element position in array
                            
                                What is more efficient: List<T>.Add() or System.Array.Resize()?
                            
                                What is this Colon inside foreach statement for php?
                            
                                Why does i[arr] work as well as arr[i] in C with larger data types?
                            
                                Check if String in String[] is in ArrayList<string>
                            
                                matlab Is there something like list comprehension as it is in python?
                            
                                How to sort a JSON array with PHP
                            
                                How to check if empty array in C
                            
                                Bash script to convert a string with space delimited tokens to an array
                            
                                Is Big-O of the C++ statement 'delete [] Q;' O(1) or O(n)?
                            
                                PHP merge array on nulls
                            
                                android: get items from string array and show it one by one in text view
                            
                                Ruby group_by in array of arrays
                            
                                How can I ignore zeros when I take the median on columns of an array?
                            
                                Numpy Indexing of 2 Arrays
                            
                                in_array() always returns TRUE [duplicate]
                            
                                Swift filter array of strings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Optimal algorithm to return largest k elements from an array of infinite number of elements in running stream

Tags:

arrays

algorithm

stream

data-structures

Ajay Gaur

People also ask

1 Answers

amit

Recent Activity

Donate For Us