Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search Algorithm to find the k lowest values in a list

I have a list that contains n double values and I need to find the k lowest double values in that list

  • k is much smaller than n
  • the initial list with the n double values is randomly ordered
  • the found k lowest double values are not required to be sorted

What algorithm would you recommend?

At the moment I use Quicksort to sort the whole list, and then I take the first k elements out of the sorted list. I expect there should be a much faster algorithm.

Thank you for your help!!!

like image 961
user667967 Avatar asked Jul 10 '12 05:07

user667967


2 Answers

You could model your solution to match the nlargest() code in Python's standard library.

  • Heapify the first k values on a maxheap.
  • Iterate over the remaining n - k values.
  • Compare each to the element of the top of the heap.
  • If the new value is lower, do a heapreplace operation (which replaces the topmost heap element with the new value and then sifts it downward).

The algorithm can be surprisingly efficient. For example, when n=100,000 and k=100, the number of comparisons is typically around 106,000 for randomly arranged inputs. This is only slightly more than 100,000 comparisons to find a single minimum value. And, it does about twenty times fewer comparisons than a full quicksort on the whole dataset.

The relative strength of various algorithms is studied and summarized at: http://code.activestate.com/recipes/577573-compare-algorithms-for-heapqsmallest

like image 170
Raymond Hettinger Avatar answered Oct 04 '22 17:10

Raymond Hettinger


You can use selection algorithm to find the kth lowest element and then iterate and return it and all elements that are lower then it. More work has to be done if the list can contain duplicates (making sure you don't end up with more elements that you need).
This solution is O(n). Selection algorithm is implemented in C++ as nth_element()

Another alternative is to use a max heap of size k, and iterate the elements while maintaining the heap to hold all k smallest elements.

for each element x:
   if (heap.size() < k):
      heap.add(x)
   else if x < heap.max():
      heap.pop()
      heap.add(x)

When you are done - the heap contains k smallest elements.
This solution is O(nlogk)

like image 29
amit Avatar answered Oct 04 '22 16:10

amit