What would be the most efficient way to take n smallest numbers from a sequence,
[ [1 2 3] [9 2 1] [2 3 4] [5 6 7] ]
I would like to take 2 smallest from the sequence based on the first item,
[1 2 3] [2 3 4]
currently I am sorting the whole list then taking first n items but that probably is not the most efficient way to go, it is a big list and I need to do this frequently.
The basic idea here is to create a min-heap of all n elements and then extract the minimum element K times. The last element to be extracted will be the Kth smallest element. Extract the minimum elements K-1 times, i.e. delete the root and perform heapify operation K times.
K'th smallest element in an unsorted array using sorting:Sort the given array and return the element at index K-1 in the sorted array. Follow the given steps to solve the problem: Sort the input array in the increasing order. Return the element at the K-1 index (0 – Based indexing) in the sorted array.
The Joy of Clojure, Chapter 6.4 describes a lazy sorting algorithm.The beauty of lazy sorting is that it will only do as much work as necessary to find the first x values. So if x << n this algorithm is O(n). Here is a modified version of that algorithm.
(defn sort-parts
[work f]
(lazy-seq
(loop [[part & parts] work]
(if-let [[pivot & xs] (seq part)]
(let [psmaller? (partial f pivot)]
(recur (list* (filter psmaller? xs)
pivot
(remove psmaller? xs)
parts)))
(when-let [[x & parts] parts]
(cons x
(sort-parts parts f)))))))
(defn qsort [xs f] (sort-parts (list xs) f))
(defn cmp [[a _ _] [b _ _]] (> a b))
(def a [[1 2 3] [9 2 1] [2 3 4] [5 6 7]])
(take 2 (qsort a cmp))
As referenced, you can use the median-of-medians algorithm to select the kth smallest element in linear time, and then partition in linear time. This will provide you with the k smallest elements in O(n). The elements will however be unsorted, so if you want the k smallest elements sorted it will cost you another O(klogk).
A few important notes:
Firstly, although the complexity is O(n) small constants are not guaranteed and you might find minimal improvement, especially if your n is reasonably small. There are random linear selection algorithms that run in better actual times (usually the expected running time is O(n) with worse worst-cases but they have smaller constants than the deterministic ones).
Why can't you maintain the array in a sorted fashion? That would probably be much more performant. You would simply need to insert each element in the correct place which costs O(logn), but finding the k smallest would then be O(1) (or O(k) if you have to build the array afresh).
If you decide against the above note, then an alternative is to keep the array sorted after every such procedure, provide insert in O(1) to the end of the array and then execute a "merge sort" every time you need to find the k smallest elements. I.e. you sort only the new ones and then merge them in in linear time. So that would cost O(mlogm + n) where m is the number of elements added since last sort.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With