What would be the most efficient way to take n smallest numbers from a sequence, <pre class="prettyprint"><code>[ [1 2 3] [9 2 1] [2 3 4] [5 6 7] ] </code></pre> I would like to take 2 smallest from the sequence based on the first item, <pre class="prettyprint"><code>[1 2 3] [2 3 4] </code></pre> currently I am sorting the whole list then taking first n items but that probably is not the most efficient way to go, it is a big list and I need to do this frequently.

The Joy of Clojure, Chapter 6.4 describes a lazy sorting algorithm.The beauty of lazy sorting is that it will only do as much work as necessary to find the first x values. So if x << n this algorithm is O(n). Here is a modified version of that algorithm. <pre class="prettyprint"><code>(defn sort-parts [work f] (lazy-seq (loop [[part & parts] work] (if-let [[pivot & xs] (seq part)] (let [psmaller? (partial f pivot)] (recur (list* (filter psmaller? xs) pivot (remove psmaller? xs) parts))) (when-let [[x & parts] parts] (cons x (sort-parts parts f))))))) (defn qsort [xs f] (sort-parts (list xs) f)) (defn cmp [[a _ _] [b _ _]] (> a b)) (def a [[1 2 3] [9 2 1] [2 3 4] [5 6 7]]) (take 2 (qsort a cmp)) </code></pre>

As referenced, you can use the median-of-medians algorithm to select the kth smallest element in linear time, and then partition in linear time. This will provide you with the k smallest elements in O(n). The elements will however be unsorted, so if you want the k smallest elements sorted it will cost you another O(klogk). A few important notes: <ol> <li>Firstly, although the complexity is O(n) small constants are not guaranteed and you might find minimal improvement, especially if your n is reasonably small. There are random linear selection algorithms that run in better actual times (usually the expected running time is O(n) with worse worst-cases but they have smaller constants than the deterministic ones).</li> <li>Why can't you maintain the array in a sorted fashion? That would probably be much more performant. You would simply need to insert each element in the correct place which costs O(logn), but finding the k smallest would then be O(1) (or O(k) if you have to build the array afresh).</li> <li>If you decide against the above note, then an alternative is to keep the array sorted after every such procedure, provide insert in O(1) to the end of the array and then execute a "merge sort" every time you need to find the k smallest elements. I.e. you sort only the new ones and then merge them in in linear time. So that would cost O(mlogm + n) where m is the number of elements added since last sort.</li> </ol>

Getting n smallest numbers in a sequence

Tags:

algorithm

clojure

What would be the most efficient way to take n smallest numbers from a sequence,

[ [1 2 3] [9 2 1] [2 3 4] [5 6 7] ]

I would like to take 2 smallest from the sequence based on the first item,

[1 2 3] [2 3 4]

currently I am sorting the whole list then taking first n items but that probably is not the most efficient way to go, it is a big list and I need to do this frequently.

954

asked Oct 02 '11 05:10

Hamza Yerlikaya

2 Answers

The Joy of Clojure, Chapter 6.4 describes a lazy sorting algorithm.The beauty of lazy sorting is that it will only do as much work as necessary to find the first x values. So if x << n this algorithm is O(n). Here is a modified version of that algorithm.

(defn sort-parts                                                                                                                                                                                                            
  [work f]                                                                                                                                                                                                                  
  (lazy-seq                                                                                                                                                                                                                 
   (loop [[part & parts] work]                                                                                                                                                                                              
     (if-let [[pivot & xs] (seq part)]                                                                                                                                                                                      
       (let [psmaller? (partial f pivot)]                                                                                                                                                                                   
         (recur (list* (filter psmaller? xs)                                                                                                                                                                                
                       pivot                                                                                                                                                                                                
                       (remove psmaller? xs)                                                                                                                                                                                
                       parts)))                                                                                                                                                                                             
       (when-let [[x & parts] parts]                                                                                                                                                                                        
         (cons x                                                                                                                                                                                                            
               (sort-parts parts f)))))))                                                                                                                                                                                   

(defn qsort [xs f] (sort-parts (list xs) f))                                                                                                                                                                                

(defn cmp [[a _ _] [b _ _]] (> a b))                                                                                                                                                                                        

(def a [[1 2 3] [9 2 1]  [2 3 4] [5 6 7]])                                                                                                                                                                                   

(take 2 (qsort a cmp))

101

answered Sep 24 '22 01:09

Julien Chastang

As referenced, you can use the median-of-medians algorithm to select the kth smallest element in linear time, and then partition in linear time. This will provide you with the k smallest elements in O(n). The elements will however be unsorted, so if you want the k smallest elements sorted it will cost you another O(klogk).

A few important notes:

Firstly, although the complexity is O(n) small constants are not guaranteed and you might find minimal improvement, especially if your n is reasonably small. There are random linear selection algorithms that run in better actual times (usually the expected running time is O(n) with worse worst-cases but they have smaller constants than the deterministic ones).
Why can't you maintain the array in a sorted fashion? That would probably be much more performant. You would simply need to insert each element in the correct place which costs O(logn), but finding the k smallest would then be O(1) (or O(k) if you have to build the array afresh).
If you decide against the above note, then an alternative is to keep the array sorted after every such procedure, provide insert in O(1) to the end of the array and then execute a "merge sort" every time you need to find the k smallest elements. I.e. you sort only the new ones and then merge them in in linear time. So that would cost O(mlogm + n) where m is the number of elements added since last sort.

answered Sep 24 '22 01:09

davin

Related questions
                            
                                Fast algorithm for sum of steps taken by the Euclidean algorithm over pairs of numbers under an upper bound
                            
                                Algorithms: Interesting diffing algorithm
                            
                                What is an Algorithm to Diff the Two Strings in the Same Way that SO Does on the Version Page?
                            
                                The perverse hangman problem
                            
                                Short, Java implementation of a suffix tree and usage?
                            
                                Most efficient way to count occurrences?
                            
                                What is the time complexity of TreeSet iteration?
                            
                                Price Filter Grouping Algorithm
                            
                                Public Transportation using Buses in City
                            
                                Convert Data to sound and back
                            
                                Simple row transposition cipher
                            
                                Picking random element by user defined weights [duplicate]
                            
                                Data structure for directed graphs, allowing fast node deletion?
                            
                                Better algorithm to riffle shuffle (or interleave) multiple lists of varying lengths
                            
                                Find the shortest path in one dimension
                            
                                Classical task-scheduling assignment
                            
                                Optimal inversion counting on int arrays
                            
                                how to numerically sample from a joint, discrete, probability distribution function
                            
                                Inference engines vs Decision trees [closed]
                            
                                Finding an Insertion in a String

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With