Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting second time round is faster

As part of a school exercise I would like to compare and contrast sorting algorithms as a Java exercise.

I implemented the sorting algorithms myself and we sort objects of a class Person which implements the Comparable interface.

So far so good, but what I can't explain is why during the first call to my sorting methods, the sorting takes longer than on subsequent calls?
The output bellow represents my results.
Best, Worst and Avg refer to the unsorted array that is passed to the sorting method:

  • Best: the array is already sorted
  • Worst: the array is sorted in reverse order
  • Avg: the objects in the array are at random order

This is my output:

1-call of the sorting methods 
InsertionSort Best:1799ms    Worst:78ms  Avg:789ms   
MergeSort     Best:10ms      Worst:3ms   Avg:5ms     

2-call of the sorting methods 
InsertionSort Best:1065ms    Worst:39ms  Avg:691ms   
MergeSort     Best:3ms       Worst:2ms   Avg:5ms     

3-call of the sorting methods 
InsertionSort Best:1066ms    Worst:39ms  Avg:692ms   
MergeSort     Best:3ms       Worst:2ms   Avg:5ms     

4-call of the sorting methods 
InsertionSort Best:1065ms    Worst:39ms  Avg:691ms   
MergeSort     Best:3ms       Worst:2ms   Avg:5ms     

Is the JVM doing any optimizations on subsequent calls?
I am puzzled and would greatly appreciate any help!

Edit: Thanks for suggestions and answers so far! To make a few points clear - each of the calls in my output refer to the time it take for a complete sorting!
After each sorting I make a new call with UNSORTED arrays again!

My source code can be downloaded as an eclipse project as a zip file, at the following Dropbox link: dropbox link to eclipse project.zip

P.S. I have no experience with profilers - if you could point me to a tutorial or so that would be great.

like image 779
erik.eilif Avatar asked Mar 12 '16 17:03

erik.eilif


People also ask

What sorting method is fastest?

But since it has the upper hand in the average cases for most inputs, Quicksort is generally considered the “fastest” sorting algorithm.

Which sorting gives best performance?

Quicksort has the 'best performance' in the 'average case' for most inputs, it is considered the 'fastest sorting algorithm' based on its storage capacity.

Why is sorted array faster?

In C++, it is faster to process a sorted array than an unsorted array because of branch prediction. In computer architecture, a branch prediction determines whether a conditional branch (jump) in the instruction flow of a program is likely to be taken or not. Branch prediction doesn't play a significant role here.

Which sorting algorithm is best when list is already sorted?

Insertion sort runs much more efficiently if the array is already sorted or "close to sorted."


2 Answers

There are many things at work here, as the variety of responses indicate.

But the first run's long runtime is probably explained by JIT (just-in-time) compilation. As discussed here, your algorithm will run in the JVM for a while as interpreted bytecode. When the Hotspot monitor determines that your sort's loops are costly, the JVM will compile them to native code. After that, they'll run considerably faster. The first run is getting the disadvantage of running in the interpreter for a while plus the extra costs of compilation. This is why "warming up" is a common term in Java benchmarks.

The differences in performance on different inputs are tied to the sort algorithm. Many algorithms behave differently based on initial data organization, and many are deliberately organized to do well on initially sorted or nearly sorted data. Here is a brilliant demonstration for the case of nearly sorted input. E.g. insertion sort is quadratic time in general, but linear time on nearly sorted input (actually O((k+1)n) for input of size n where elements are no more than k positions from correctly sorted).

Then there is the branch prediction issue already referenced by link. Modern processors have various mechanisms that attempt to "guess" which way a branch (essentially an "if" statement at the machine level) will go based on recent history gathered as the program runs. The cost of a bad guess is high. The goodness of the guess is likely to be affected by both algorithm and data details.

like image 73
Gene Avatar answered Oct 30 '22 17:10

Gene


Processing a sorted array is faster than processing an un-sorted one due to Branch Prediction.
This has been covered extensively in the most famous Stack Overflow question.

like image 27
Idos Avatar answered Oct 30 '22 18:10

Idos