Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do priority of algorithm change, when input size change

I am studying about time complexity of algorithms. The book explained that

Running time of insertion sort is O(n^2)

Running time of merge sort is O(n logn)

When n is small, insertion sort is better And When n is large merge sort is better.

I am not understanding this concept that why this is so? Why do the priority of algorithm vary when input size vary?

like image 793
TariqS Avatar asked May 15 '26 23:05

TariqS


1 Answers

The study of complexity is used to see how the efficiency of your program changes as the problem size grows. This is useful if your input sizes don't have defined bounds.

However, if your problem size has a defined bound, such as n is less than 60, then the complexity of a solving algorithm won't be useful to you: an algorithm with O(1) complexity may be slower than one with O(n2) complexity!

When algorithm A has lower Big-O (worst-case) time complexity than B, generally you can think that this implies that there exists an N such that all problem sizes larger than N will be solved faster by A.

More correctly, it means that for any n larger than N, the worst-case problem of size n for algorithm A will always be faster (have fewer steps) than the worst-case problem of size n for algorithm B (typically these will be completely different problems!).

In programming, big-O complexity theory is useful, but only in how it helps us understand the average-case complexity (which is harder to analyse). In your example, merge sort is used instead of insertion sort for large problem sizes, not because of the Big-O complexity, but because the average-case complexity (which, in this case, is the same as the Big-O complexity).

An example of when Big-O complexity doesn't matter in practice, (even if we care about arbitrarily large problem sizes):

  • Quick sort, O(n2), average(n log(n))
  • Merge sort, O(n log(n)), average(n log(n))

Because the average complexities are the same we cannot tell who is faster without more analysis. This is kind of like a tie-breaker. The results of this analysis are famous (not that I've ever looked at it): quick sort is significantly faster (has a lower multiplicative constant associated with it's average-case complexity). This means that if the number of sorting elements is arbitrarily large, quick sort is always the best option because on average it will be faster.

Lastly, an algorithm having a higher "hidden multiplicative constant" is sometimes obvious when you look at the steps in the algorithm, but in practice this is really less mathematical and more experimental. In practice the "steps" in your algorithm are performed by your processor and depend on its architecture and how you've used your programming language to instruct it, and how the processor caches/fetches memory, your operating system's system call procedure, etc. etc.

In summary: What's certain is that if algorithm A has a better (lower) average-case complexity than algorithm B, then there exists some N where A will on average solve problems faster than B for problem instances larger than N. We can measure the performance to find (or approximately find) the lowest N. If algorithm A is a clear winner compared to the other, then N may be 0, but often algorithms with low complexity are - ironically - more complicated and typically perform more analysis at each step which has some overhead, making them slower for small problem instances.

Final final comment: If you can find the lowest N such that algorithm A is always faster than algorithm B for problem sizes greater than n, then this doesn't actually imply the inverse: there may not exist N' > 2 such that any problem instance of size less than N' will be solved faster by B. If this N' exists, then often there is a "middle" size range where the dominating efficiency changes between the two options, something like this:

  • 1-83, B is faster
  • 84-92, A is faster
  • 93-111, B is faster
  • 112 onwards, A is faster

If the problem is slow to solve at sizes of around 90 (more than a millisecond or so), then we might do a full range check to see which is best. Otherwise we might just say if (n > 111) do A, even though we know that there are some instances that will be solved by B which would have been faster solved by A. And if the efficiency benefits of doing B for the smaller ranges aren't significant enough, we might always choose A.

like image 148
Elliott Avatar answered May 18 '26 23:05

Elliott



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!