<p>If I have a <code>Map</code> like this:</p> <pre class="prettyprint"><code>HashMap<Integer, ComparableObject> map; </code></pre> <p>and I want to obtain a collection of values sorted using natural ordering, which method is fastest?</p> <h3>(A)</h3> <p>Create an instance of a sortable collection like <code>ArrayList</code>, add the values, then sort it:</p> <pre class="prettyprint"><code>List<ComparableObject> sortedCollection = new ArrayList<ComparableObject>(map.values()); Collections.sort(sortedCollection); </code></pre> <h3>(B)</h3> <p>Create an instance of an ordered collection like <code>TreeSet</code>, then add the values:</p> <pre class="prettyprint"><code>Set<ComparableObject> sortedCollection = new TreeSet<ComparableObject>(map.values()); </code></pre> <p>Note that the resulting collection is never modified, so the sorting only needs to take place once.</p>

<p>TreeSet has a <code>log(n)</code> time complexity guarantuee for <code>add()/remove()/contains()</code> methods. Sorting an <code>ArrayList</code> takes <code>n*log(n)</code> operations, but <code>add()/get()</code> takes only <code>1</code> operation.</p> <p>So if you're mainly retrieving, and don't sort often, <code>ArrayList</code> is the better choice. If you sort often but dont retrieve that much <code>TreeSet</code> would be a better choice.</p>

<p>Theoretically, sorting at the end should be faster. Maintaining sorted state through the process could involve additional CPU time.</p> <p>From the CS points of view, both operations are NlogN, but 1 sort should have lower constant.</p>

Is it faster to add to a collection then sort it, or add to a sorted collection?

Tags:

java

collections

sorting

If I have a Map like this:

HashMap<Integer, ComparableObject> map;

and I want to obtain a collection of values sorted using natural ordering, which method is fastest?

(A)

Create an instance of a sortable collection like ArrayList, add the values, then sort it:

List<ComparableObject> sortedCollection = new ArrayList<ComparableObject>(map.values());
Collections.sort(sortedCollection);

(B)

Create an instance of an ordered collection like TreeSet, then add the values:

Set<ComparableObject> sortedCollection = new TreeSet<ComparableObject>(map.values());

Note that the resulting collection is never modified, so the sorting only needs to take place once.

770

asked Aug 31 '10 09:08

gutch

3 Answers

TreeSet has a log(n) time complexity guarantuee for add()/remove()/contains() methods. Sorting an ArrayList takes n*log(n) operations, but add()/get() takes only 1 operation.

So if you're mainly retrieving, and don't sort often, ArrayList is the better choice. If you sort often but dont retrieve that much TreeSet would be a better choice.

183

answered Oct 04 '22 17:10

fasseg

Theoretically, sorting at the end should be faster. Maintaining sorted state through the process could involve additional CPU time.

From the CS points of view, both operations are NlogN, but 1 sort should have lower constant.

answered Oct 04 '22 18:10

BarsMonster

Why not use the best of both worlds? If you are never using it again, sort using a TreeSet and initialize an ArrayList with the contents

List<ComparableObject> sortedCollection = 
    new ArrayList<ComparableObject>( 
          new TreeSet<ComparableObject>(map.values()));

EDIT:

I have created a benchmark (you can access it at pastebin.com/5pyPMJav) to test the three approaches (ArrayList + Collections.sort, TreeSet and my best of both worlds approach) and mine always wins. The test file creates a map with 10000 elements, the values of which have an intentionally awful comparator, and then each of the three strategies get a chance to a) sort the data and b) iterate over it. Here is some sample output (you can test it yourselves):

EDIT: I have added an aspect that logs calls to Thingy.compareTo(Thingy) and I have also added a new Strategy based on PriorityQueues that is much faster than either of the previous solutions (at least in sorting).

compareTo() calls:123490
Transformer ArrayListTransformer
    Creation: 255885873 ns (0.255885873 seconds) 
    Iteration: 2582591 ns (0.002582591 seconds) 
    Item count: 10000

compareTo() calls:121665
Transformer TreeSetTransformer
    Creation: 199893004 ns (0.199893004 seconds) 
    Iteration: 4848242 ns (0.004848242 seconds) 
    Item count: 10000

compareTo() calls:121665
Transformer BestOfBothWorldsTransformer
    Creation: 216952504 ns (0.216952504 seconds) 
    Iteration: 1604604 ns (0.001604604 seconds) 
    Item count: 10000

compareTo() calls:18819
Transformer PriorityQueueTransformer
    Creation: 35119198 ns (0.035119198 seconds) 
    Iteration: 2803639 ns (0.002803639 seconds) 
    Item count: 10000

Strangely, my approach performs best in iteration (I would have thought there would be no differences to the ArrayList approach in iteration, do I have a bug in my benchmark?)

Disclaimer: I know this is probably an awful benchmark, but it helps get the point across to you and I certainly did not manipulate it to make my approach win.

(The code has a dependency to apache commons / lang for the equals / hashcode / compareTo builders, but it should be easy to refactor it out)

answered Oct 04 '22 18:10

Sean Patrick Floyd

Related questions
                            
                                Final variable assignment with try/catch
                            
                                In ArrayBlockingQueue, why copy final member field into local final variable?
                            
                                Android Deprecated Annotation is deprecated, what's the replacement?
                            
                                How do I get the `.class` attribute from a generic type parameter?
                            
                                Eclipse: Should I create a workspace for each project?
                            
                                Difference between openjdk-6-jre, openjdk-6-jre-headless, openjdk-6-jre-lib
                            
                                Java JVM profiling, thread status - what does "Monitor" status mean?
                            
                                What could be the cause of RejectedExecutionException
                            
                                websocket closing connection automatically [closed]
                            
                                Will Java Final variables have default values?
                            
                                Creating multiple log files of different content with log4j
                            
                                When to use byte array & when byte buffer?
                            
                                How to prevent Gson from expressing integers as floats
                            
                                Is there a Java equivalent of Python's 'enumerate' function?
                            
                                Why is volatile used in double checked locking
                            
                                Android exception handling best practice?
                            
                                Java unsupported major minor version 52.0 [duplicate]
                            
                                Android get Current UTC time [duplicate]
                            
                                Pyspark: Exception: Java gateway process exited before sending the driver its port number
                            
                                Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With