Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Big(O) be affirmed by measurement?

Let's say you have designed an algorithm which you might think runs in O(n). If I measure the time it runs with 1000 input and then increase the input 10x and then measure again. Can I infer that O(n) is correct iff run time is almost 10 times the first try?

How stupid would this be? Obviously repeating the test would give a better accuracy but I wanna know if this makes sense at all.

like image 749
Sam R. Avatar asked Apr 10 '15 00:04

Sam R.


2 Answers

Often, the answer is 'yes'. If you increase the problem size by 10 and the time goes up by 10, you're probably correct to assume O(N). However, the number is unlikely to quite so beautiful.

If you go from 1,000 to 10,000, an O(N.logN) algorithm goes up by a factor of 13, roughly (see the bc below). That's not far off 10, and you might mistakenly think that an increase of 12 indicates O(N.logN) rather than O(N). However, if you increase by 10 and the time goes up by about 100, you're likely dealing with a non-linear algorithm — O(N2) probably. So, 2 points is not enough, but it is indicative. Multiple runs and more data points both help.

Sometimes, though, something else kicks in. For example, you might suddenly start using so much memory that your program is paged instead of just running. It will slow down dramatically, even though the algorithm is still linear given enough resources.

Also, beware caching effects and optimization effects. Caching can make things seem faster. If the optimizer concludes that the calculation is ignored, it might eliminate the entire calculation. So you have to be cautious.

But, with a bit of luck, you can scale the problem a few orders of magnitude (or, at least, a few different numbers) and come up with a reasonable guess as to whether it is linear or something else.

O(N.logN) for 1,000 vs 10,000

$ bc -l
n=1000
n*l(n)
6907.75527898213705205000
a=n*l(n)
m=n*10
m*l(m)
92103.40371976182736070000
b=m*l(m)
b/a
13.33333333333333333333
quit
$
like image 59
Jonathan Leffler Avatar answered Nov 10 '22 17:11

Jonathan Leffler


On the contrary to the other answer, I will say "no". However, you can get a pretty good guess (not even an estimate as it is not appropriate here). This is probably what is meant by "often".

The thing is, that you never know the constant factors. Big Oh is asymptotic behaviour (in infinity), this is a very very useful to drop everything except for the most growing term. So mathematically you cannot confirm your assumption.

Firstly, here is plenty of algorithms and use-cases when asymptotic behaviour is not useful in real-life applications. Simply because "typical use-case" input distribution falls off. This is more often case. And you could still test/"estimate" it.

But there is also a case where the best algorithm has such big constant factors it is not applicable on modern systems. The best example I know are large number multiplication algorithms.

However, there are systems that "approximate" (better said guess) complexity class of an algorithm. I am not sure whether Codility_ measure it or get their guess by code analysis but they are able to do it: https://app.codility.com/public-report-detail/.

What could be done do is to run an algorithm, and change input size, run tests and fit the data to model. It's quite simple. Then you can say that for tested input range the algorithms behaves as O(class(n)). (This might have a practical meaning valuable even more than theoretical asymptotic complexity.)

Note that choosing the testing points is not trivial. Basically if your algorithm behaves "fast" then you need to increase input size rate to the next class. E.g. if you have something like (100n+n!) you can go n={1,10,100} because it will kill execution time. However going n={1,2,3,4,5,6,7} won't pick up the n! part (ok 7! is 5040 but for n^2 it would be much harder).

Bottom line, getting a nice guess is certainly possible, but beyond most simple cases it can be tricky and hard to do, and sadly it is quite hard to tell if case is tricky or not.

Also, this discussion is purely theoretical, skipping hardware effects. I have heard said of algorithms that n^2 behaved better than n^log n because former was always (very) cache friendly, but don't take my word for it, I can't recall source.

like image 6
luk32 Avatar answered Nov 10 '22 19:11

luk32