What are efficient ways to sort arrays that have mostly a small set of duplicated elements? That is, a list like: { 10, 10, 55, 10, 999, 8851243, 10, 55, 55, 55, 10, 999, 8851243, 10 } Assuming that the order of <code>equal</code> elements doesn't matter, what are good worst-case/average-case algorithms?

I would try Counting sort with some mapping function. Ie. you wont use the frequencies array of size equal to the range of elements, instead you would iterate over the array, write down distinct elements and use them in a mapping function to the array of frequencies. This way the algorithm has only one extra iteration and a mapping function, which should work in a constant time (using some kind of hash table). The complexity of this approach would be <code>O(n)</code>, which should be optimal.

Fast sort algorithms for arrays with mostly duplicated elements?

2 Answers

In practice, you can first iterate through the array once and use a hash table the count the number of occurrences of the individual elements (this is O(n) where n = size of the list). Then take all the unique elements and sort them (this is O(k log k) where k = number of unique elements), and then expand this back to a list of n elements in O(n) steps, recovering the counts from the hash table. If k << n you save time.

168

answered Oct 18 '22 18:10

Antti Huima

I would try Counting sort with some mapping function. Ie. you wont use the frequencies array of size equal to the range of elements, instead you would iterate over the array, write down distinct elements and use them in a mapping function to the array of frequencies.

This way the algorithm has only one extra iteration and a mapping function, which should work in a constant time (using some kind of hash table). The complexity of this approach would be O(n), which should be optimal.

answered Oct 18 '22 18:10

malejpavouk

Related questions
                            
                                Java - calling static methods vs manual inlining - performance overhead
                            
                                How to calculate the index (lexicographical order) when the combination is given
                            
                                Java reflection run-time performance
                            
                                What is the difference between TPC-C, TPC-E and TPC-H benchmark?
                            
                                Fast serialization/deserialization of structs
                            
                                Plain C++ Code 10 times faster than inline assembler. Why?
                            
                                Discover what process/query is using oracle temp tablespace
                            
                                Is WPF the reason my application is slow?
                            
                                Danger of C# Substring method?
                            
                                Does the method name length have any impact whatsoever on the performance?
                            
                                OpenMP with "collapse()" for nested for-loops performs worse when without
                            
                                Degrading performance when increasing number of cores
                            
                                Swift Explicit vs. Inferred Typing : Performance
                            
                                Strange JavaScript performance
                            
                                Fast method to transform a string with about 150mb
                            
                                Why is iterating over a map so much slower than iterating over a slice in Golang?
                            
                                Is it possible to inline function, containing loop in Golang?
                            
                                Fast iterating over first n items of an iterable (not a list) in python
                            
                                How to check which stored procedure is taking maximum time in sql server
                            
                                Why Play! framework chose Groovy for template engine

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fast sort algorithms for arrays with mostly duplicated elements?

Tags:

performance

language-agnostic

algorithm

sorting

duplicates

donnyton

People also ask

2 Answers

Antti Huima

malejpavouk

Recent Activity

Donate For Us