Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle Big O when one variable is known to be smaller than another one?

We have 4 algorithms, all of them with complexity depending on m and n, like:

Alg1: O(m+n) Alg2: O(mlogm + nlogn) Alg3: O(mlogn + nlogm) Alg4: O(m+n!) (ouch, this one sucks, but whatever)

Now, how do we handle this if we now that n>m? My first thought is: Big O notation "discard" constant and smaller variables because it doesn't matter when, but sooner or later the "bigger term" will overwhelm all the others, making them irrelevant in the computation cost.

So, can we rewrite Alg1 as O(n), or Alg2 as O(mlogm)? If so, what about the others?

like image 996
justHelloWorld Avatar asked Oct 23 '17 17:10

justHelloWorld


People also ask

Under what situation can we ignore Big O notation?

We may ignore any powers of n inside of the logarithms. The set O(log n) is exactly the same as O(log(nc)). The logarithms differ only by a constant factor (since log(nc) = c log n) and thus the big O notation ignores that.

How do you compare two Big O notations?

When comparing Big-Oh notations, you ignore all constants: N^2 has a higher growth rate than N*log(N) which still grows more quickly than O(1) [constant]. The power of N determines the growth rate. because the power of N 3 > 2 .

Which is a correct order of Big O notation from low to high?

The most common complexity classes are (in ascending order of complexity): O(1), O(log n), O(n), O(n log n), O(n²).


2 Answers

yes you can rewrite it if you know that it is always the case that n>m. Formally, have a look at this

if we know that n>m (always) then it follows that

O(m+n) < O(n+n) which is O(2n) = O(n) (we don't really care about the 2)

also we can say the same thing about the other algorithms as well

O(mlogm + nlogn) < O(nlogn + nlogn) = O(2nlogn) = O(nlogn)

I think you can see where the rest of them are going. But if you do not know that n > m then you cannot say the above.

EDIT: as @rici nicely pointed out, you also need to be careful as well, since it'll always depend on the given function. (Note that O((2n)!) can not be simplified to O(n!))

With a bit of playing around you can see how this is not true

(2n)! = (2n) * (2n-1) * (2n-2)... < 2(n) * 2(n-1) * 2(n-2) ...

=> (2n)! = (2n) * (2n-1) * (2n-2)... < 2^n * n! (After combining all of the 2 coefficients)

Thus we can see that O((2n)!) is more like O(2^n * n!) to get a more accurate calculation you can see this thread here Are the two complexities O((2n + 1)!) and O(n!) equal?

like image 196
TNguyen Avatar answered Nov 13 '22 11:11

TNguyen


Consider the problem of finding the k largest elements of an array. There's a nice O(n log k)-time algorithm for solving this using only O(k) space that works by maintaining a binary heap of at most k elements. In this case, since k is guaranteed to be no bigger than n, we could have rewritten these bounds as O(n log n) time with O(n) memory, but that ends up being less precise. The direct dependency of the runtime and memory usage on k makes clear that this algorithm takes more time and uses more memory as k changes.

Similarly, consider many standard graph algorithms, like Dijkstra's algorithm. If you implement Dijkstra's algorithm using a Fibonacci heap, the runtime works out O(m + n log n), where m is the number of nodes and n is the number of edges. If you assume your input graph is connected, then the runtime also happens to be O(m + m log m), but that's a less precise bound than the one that we had.

On the other hand, if you implement Dijkstra's algorithm with a binary heap, then the runtime works out to O(m log n + n log n). In this case (again, assuming the graph is connected), the m log n term strictly dominates the n log n term, and so rewriting this as O(m log n) doesn't lose any precision.

Generally speaking, you'll want to give the most precise bounds that you can in the course of documenting the runtime and memory usage of a piece of code. If you have multiple terms where one clearly strictly dominates the other, you can safely discard those lower terms. But otherwise, I wouldn't recommend discarding one of the variables, since that loses precision in the bound you're giving.

like image 42
templatetypedef Avatar answered Nov 13 '22 13:11

templatetypedef