Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is order of growth preferred as a benchmark for algorithm performance wrt runtime?

Tags:

algorithm

I learnt that growth rate is often used to gauge the runtime and efficiency of an algorithm. My question is why use growth rate instead of using the exact(or approximate) relation between the runtime and input size?

Edit:

Thanks for the responses. I would like to clarify what I meant by "relation between the runtime and input size" as it is a little vague.

From what I understand, growth rate is the gradient of the runtime against input. So a growth rate of n^2 would give an equation of the form t = k(n^3) + Constant. Given that the equation is more informative(as it includes constants) and shows a direct relation to the time needed, I thought it would be preferred.

I do understand that as n increase, constants soon becomes irrelevant and depending on different computation configuration, k will be different. Perhaps that is why it is sufficient to just work with the growth rate.

like image 297
user3218450 Avatar asked Feb 11 '14 13:02

user3218450


2 Answers

The algorithm isn't the only factor affecting actual running time

Things like programming language, optimizations, branch prediction, I/O speed, paging, processing speed, etc. all come into play.

One language / machine / whatever may certainly have advantages over another, so every algorithm needs to be executed under the exact same conditions.

Beyond that, one algorithm may outperform another in C when considering input and output residing in RAM, but the other may outperform the first in Python when considering input and output residing on disk.

There will no doubt be little to no chance of agreement on the exact conditions that should be used to perform all the benchmarking, and, even if such agreement could be reached, it would certainly be irresponsible to use 5-year-old benchmarking results today in the computing world, so these results would need to be recreated for all algorithms on a regular basis - this would be a massive, very time-consuming task.

Algorithms have varying constant factors

In the extreme case, the constant factors of certain algorithms are so high that other asymptotically slower algorithms outperform it on all reasonable inputs in the modern day. If we merely go by running time, the fact that these algorithms would outperform the others on larger inputs may be lost.

In the less extreme case, we'll get results that will be different at other input sizes because of the constant factors involved - we may see one algorithm as faster in all our tests, but as soon as we hit some input size, the other may become faster.

The running times of some algorithms depend greatly on the input

Basic quicksort on already sorted data, for example, takes O(n^2), while it takes O(n log n) on average.

One can certainly determine the best and worst cases and run the algorithm for those, but the average case is something that could only be determined through mathematical analysis - you can't run it for 'the average case' - you could run it a bunch of times for random input and get average of that, but that's fairly imprecise.

So a rough estimate is sufficient

Because of the above reasons, it makes sense to just say an algorithm is, for example, O(n^2), which very roughly means that, if we're dealing with large enough input size, it would take 4 times longer if the input size doubles. If you've been paying attention, you'll know that the actual time taken could be quite different from 4 times longer, but it at least gives us some idea - we won't expect it to take twice as long, nor 10 times longer (although it might under extreme circumstances). We can also reasonably expect, for example, an O(n log n) algorithm to outperform an O(n^2) algorithm for a large n, which is a useful comparison, and may be easier to see what's going on than some perhaps more exact representation.

like image 74
Bernhard Barker Avatar answered Sep 28 '22 22:09

Bernhard Barker


You can use both types of measures. In practice, it can be useful to measure performance with specific inputs that you are likely to work with (benchmarking), but it is also quite useful to know the asymptotic behavior of algorithms, as that tells us the (space/time) cost in the general case of "very large inputs" (technically, n->infinity).

Remember that in many cases, the main term of the runtime often far outweighs the importance of lower-order terms, especially as n takes on large values. Therefore, we can summarize or abstract away information by giving a "growth rate" or bound on the algorithm's performance, instead of working with the "exact" runtime or space requirements. Exact in quotes because the constants for the various terms of your runtime may very much vary between runs, between machines - basically different conditions will produce different "constants". In summary, we are interested in asymptotic algorithm behavior because it is still very useful and machine-agnostic.

like image 37
Martin Dinov Avatar answered Sep 28 '22 22:09

Martin Dinov