Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

An efficient sorting algorithm for almost sorted list containing time data?

The name says it all really. I suspect that insertion sort is best, since it's the best sort for mostly-sorted data in general. However, since I know more about the data there is a chance there are other sorts woth looking at. So the other relevant pieces of information are:

1) this is time data, which means I presumable could create an effective hash for ordering of data. 2) The data won't all exist at one time. instead I'll be reading in records which may contain a single vector, or dozen or hundreds of vectors. I want to output all time within a 5 second window. So it's possible that a sort that does the sorting as I insert the data would be a better option. 3) memory is not a big issue, but CPU speed is as this may be a bottleneck of the system.

Given these conditions can anyone suggest an algorithm that may be worth considering in addition to insertion sort? Also, How does one defined 'mostly sorted' to decide what is a good sort option? What I mean by that is how do I look at my data and decided 'this isn't as sorted as I thought it as, maybe insertion sort is no longer the best option'? Any link to an article which considered process complexity which better defines the complexity relative to the degree data is sorted would be appreciated.

Thanks

Edit: thank you everyone for your information. I will be going with an easy insertion or merge sort (whichever I have already pre-written) for now. However, I'll be trying some of the other methods once were closer to the optimization phase (since they take more effort to implement). I appreciate the help

like image 854
errah Avatar asked Jun 13 '12 14:06

errah


Video Answer


2 Answers

I would throw in merge sort if you implement the natural version you get a best case of O(N) with a typical and worst case of O(N log N) if you have any problems. Insertion you get a worst case of O(N^2) and a best case of O(N).

like image 42
Travis Pessetto Avatar answered Sep 21 '22 15:09

Travis Pessetto


You could adopt option (2) you suggested - sort the data while you insert elements.

Use a skip list, sorted according to time, ascending to maintain your data.

  • Once a new entree arrives - check if it is larger then the last element (easy and quick) if it is - simply append it (easy to do in a skip list). The skip list will need to add 2 nodes on average for these cases, and will be O(1) on average for these cases.
  • If the element is not larger then the last element - add it to the skip list as a standard insert op, which will be O(logn).

This approach will yield you O(n+klogn) algorithm, where k is the number of elements inserted out of order.

like image 51
amit Avatar answered Sep 22 '22 15:09

amit