I found some references about big O notation, but as far as I can understand algorithm complexity is a function of size of input data.
For example, if complexity of bubble sort is O(n^2)
, n
is the size of input array. Right?
But, how can I determinate complexity of algorithm that has fixed input size and depends of values of input. For example, finding Greatest Common Divisor (GCD) would look like this:
def GCD(x, y):
while x != y:
if x < y:
x, y = y - x, x
else:
x, y = x - y, x
return x
What is complexity of this algorithm? And how is it determined?
Edit: Changed name of the function and corrected name of algorithm. ShreevatsaR, thanks for pointing it out.
The time complexity does not depend on the input size, i.e., regardless of the input size, the algorithm will have the same runtime. For example, the following situation has O(1) time complexity: A loop or recursion that runs a constant number of times.
Algorithmic complexity is concerned about how fast or slow particular algorithm performs. We define complexity as a numerical function T(n) - time versus the input size n. We want to define time taken by an algorithm without depending on the implementation details.
Definition For a given algorithm, the input size is defined as the number of characters it takes to write the input. an algorithm is polynomial-time in terms of one of these input sizes if and only if it is polynomial-time in terms of the other.
To visualize how our code scales up, we will use Big O notation, which is a notation that shows the limiting behavior of a function when the argument tends towards a particular value or infinity. In essence, it will show us how our algorithm performs when its input size gets larger.
People play a bit fast and loose with big-O notation. In the case of GCD, they generally do it in 2 ways:
1) You're right, algorithmic complexity, and hence big-O notation, should be stated in terms of the size in bits of the input, not in terms of the values of the input. That's how P, NP, and so on are defined. Assuming binary input and arbitrarily-large numbers (like a BigNum representation), and N the number of bits of input, your GCD requires at most 2^N subtractions, each of which requires time O(N) to run over each digit of the numbers being subtracted. So it's O(N*2^N). GCD can of course be done much faster if you use division rather than subtraction: O(N^2).
So, when we say that testing primality was proved in 2002 to be done in polynomial time, that's the technical definition of complexity, and we mean polynomial in the number of digits (which is the tricky part), not polynomial in the input number itself (which is trivially easy to do in "sub-linear time" using trial division).
But in practice, for algorithms which take a fixed number of integer inputs, it's more convenient to talk about complexity as though N were the input itself, not the size of the input. It's not harmful as long as you're clear what you mean in cases that might be ambiguous.
2) In practice, integer inputs are often fixed-size, 32 bit or whatever, and operations on them such as addition, multiplication and division are O(1) time. We use these facts selectively in our order analysis. Technically if your GCD program only accepts inputs up to (2^32-1), then it is O(1). It has a fixed upper bound on its running time. End of analysis.
Although technically correct that's not a very useful analysis. Almost anything you do on a real computer is O(1) on the same basis, that the size of the problem is constrained by the hardware.
It's usually more convenient to accept that addition is O(1) because the numbers are fixed-size, but ignore that GCD is also O(1), pretend that its behaviour in the range [1, 2^32) extends to infinity, and analyse it on that basis. Then with N the max of the two inputs, it comes out O(N): O(N) subtractions, each taking constant time.
Again, this is not ambiguous once you know what the terms of reference are, but beware of incorrectly comparing the first analysis I gave of Euclid's algorithm with division, O(N^2), against this analysis of the algorithm with subtraction, O(N). N is not the same in each, and subtraction is not faster ;-)
Big-O notation should specify what is being measured.
For example, Big-O notation for sort algorithms usually measures the number of comparisons.
Your GCD example can be measured comparing the values of x
and y
against the number of instructions executed. Let's look more closely:
def GCD(x, y):
while x != y: # 1
if x < y: # 2
x, y = y - x, x # 3
else: # 4
x, y = x - y, x # 5
return x # 6
Work from the inside to the outside.
No matter the values of x
and y
, steps 3 and 5 will always take one instruction. Therefore, the if
statement of step 2 will always take two instructions.
The harder question is step 1. With every iteration, either x
or y
will be lowered by the smaller value. Assume that x > y
. One of two things will happen:
If it started that x % y == 0
, then the loop will be executed (x / y) - 1
times and the program will stop.
Otherwise, x
will be reduced (x / y)
times before it's smaller than y
and the program will continue.
You can easily measure the number of instructions for any given x
and y
. You can easily show that, for a given number z
, you will never need more than z - 1
subtractions -- or 2 * (z-1)
instructions. (Think about gcd(z, 1)
.)
the input size is the sum of the sizes of the numbers x
and y
(e.g. how many digits are in the number)
Big O complexity is the worst case asymptotic runtime behavior. It's not necessarily dependent on the input size (quantity of inputs) to a particular algorithm - though that is often the case. In other words, it's the limiting function that describes the runtime as the inputs are taken to infinity.
In your case, if x or y are unbounded, so is the asymptotic runtime. If not, think about the run time if x = 1, and y = Int32.Max?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With