Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complexity of Nested for Loops

I'm trying to figure out the time of my algorithm, using Big O notation, and I couldn't find a quite clear explanation about it.

Basically, my algorithm consists in comparing a new array to all other arrays in a "parent" array.

For that, I have a for loop, that iterates all elements in the parent array, looking for an array that looks like the newly created array.

Here's the code:

bool AlreadyExistingArray(Array array)
{
    bool areEqual = true;
    foreach (Array a in arrayEntries)
    {
        if (a.count != array.count)
            continue;

        foreach (int i in array)
        {
            if (!a.contains(i))
            {
                areEqual = false;
                break;
            }
        }
        if (areEqual)
        {
            areEqual = false;
            foreach (int i in a)
            {
                if (!a.contains(i))
                {
                    areEqual = false;
                    break;
                }
            }
        }
    }
    return areEqual;
}

I understand that each of the for loops should be a O(n), however, should the complexity be composed? Since I'm dealing with different sized arrays, I'm quite sure the complexity cannot be considered O(n^2).

Hope I made myself clear! Otherwise, let me know and I'll try and clarify even further.

Edit: changed algorithm.

like image 361
Nicolas Reichert Avatar asked Mar 11 '23 07:03

Nicolas Reichert


2 Answers

A Little Background on Big-Oh:

The reason you are getting so many varying answers here is that Big-O analysis is not as simple as just counting the number of steps the program runs. This simplified version which is taught to Computer Scientists is an approximation (that is usually sufficient) of the concept of Big-O, Big-Omega, and Big-Theta asymptotic bounds of functions. There is actually a formal definition of Big-O which would eradicate these ambiguities.

Nevertheless, let us persist. My answer from the comments was such:

Call the size of arrayEntries: len(arrayEntries) = n and the size of array len(array) = m and the size of the largest entry in array len(largestOfarray) = k (that is the size of the largest variable you call a). Then your algorithm is O(n(m+k)). Whichever one of n,m, or k are constants that will never change size, just remove them from that equation.

Let me explain the above. The definition of Big-O is roughly:

In computer science, big O notation is used to classify algorithms by how they respond to changes in input size, such as how the processing time of an algorithm changes as the problem size becomes extremely large.

When you say an algorithm is O(n^2), what this means is this. You are saying that the runtime of your algorithm can be denoted as some function T(n) (note it only has one variable n) and asymptotically T(n) grown no faster than n^2 (that is, very roughly, as n gets very large, the gradient of the function f(n) = n^2 at n will be bigger than the gradient of T(n) at n).

Now in the previous example, the runtime of the algorithm depended on one variable n. This is not always the case. Imagine you have a program like this:

void algorithm(Array arrayM, Array arrayN) {
    for(int i : arrayN) {
        // runs len(arrayN) = n times
    }
    for(int j : arrayM) {
        // runs len(arrayM) = m times
    }
}

The runtime of this algorithm is a function that depends on the size of arrayM and arrayN so it would be denoted T(n,m). Now if these sizes are independent variables (i.e. the size of arrayM has no relation to the size of arrayN), then the runtime of this algorithm is O(m+n). However, if the size of arrayM did depend on the size of arrayN (say it was initialized by copying half the elements of arrayN into it), then len(arrayM) = m actually depends on n such m = n/2. Thus your algorithm's time complexity which was previously O(m+n), is now O(n+n/2) = O(n). This is intrinsically because your runtime function T(n,m) can now be written as T(n, n/2) ~ T(n) i.e. it is a function of one variable.

Your Specific Example:

Now in the case of your program, initially let us assume that the size of arrayEntries len(arrayEntries) = n and the size of array len(array) = m and the size of the largest entry in array (the largest possible a) len(largestOfarray) = k, are completely independent variables that don't depend on each other. This is not an unreasonable assumption since you said in one of your comments that "the inner loops do not depend on any value of the outter one" since it is number of user inputs and the length of the input might be arbitrarily long as one user might input something of length 1 and another user might input a line 1000 characters long.

Since n, m, and k are independent, your complexity is thus. Your outer loop runs n times and within each iteration, the first inner loop will run m times and the second one will at worst run k times. Thus your total complexity is O(n(m+k)). But is that all? Well not quite.

See there are some bounds to n,m, and k. Namely, that the length of the user input (k) probably has a limit (i.e. if it is being taken from stdin, then it probably won't be longer than 1000 chars). If so, you could reasonably say that the worst k could be is 1000 and so we can treat it like a constant and then the complexity of your algorithm is O(nm) because the constant k is eradicated. This step depends wholly on how you think your program will be used but if you want to be safe, you could just say the complexity is O(n(m+k)).

This does however beg the question, could one not reason that m and n are also bounded because there is a limit on how large they can be (namely how much memory your operating system will allocate to your program) and thus be treated as constants? Well technically, yes. And in some algorithmic analyses, this is a useful thing to do sometimes (such as in the case of slow growing algorithms).

Ultimately it all depends on how you think your program would work and what are reasonable assumptions to make (i.e. k can be considered a constant) and which ones to avoid. In my personal opinion, this algorithm would be O(n(m+k)) and possibly O(nm) but I wouldn't cut it more than that; m and n seem pretty independent from your descriptions.

Important Edit (answer above is not correct after your code edit):

An interesting case study actually is what was commented to this answer below by @frenzykryger; a detail I missed out because you edited your question while I was writing the answer. The commenter said that you changed the start of your outer loop to check if the size of a is equal to the size of array. This means the number of times your second inner loop will run (i.e. the size of k as described above) now depends completely on m (recall m was the size of array), namely k = m. Thus your algorithm is O(n(m+m)) = O(nm). Now if you are guaranteed that m is always less than n, then the algorithm would be O(n^2) (the m can be discarded). But if m is unbounded (could be any size), then the algorithm remains O(nm).

Final Remarks:

As you can see, Big-Oh analysis is something that sometimes doesn't have a single right answer. It all depends on how your program will behave, what sort of inputs you are guaranteed to get, and many other factors. If all this makes you wish there were a more rigorous way to define it, there certainly is - just google "Big-Oh formal definition", read some links, head on over to mathematics stack exchange, and you'll have yourself a guaranteed party.

like image 174
gowrath Avatar answered Mar 21 '23 05:03

gowrath


Big O notation is concerned with computational growth as the size of your application changes. Thus there are three key questions here that will determine the big-O notation, all are related to how your application scales:

  1. As your application scales, does the number of elements in your "new array" grow? In other words, early in the life of your application, might there be 20 elements in this "new array" but as the application has more users (or data or whatever), might the count grow to n elements?
  2. As your application scales, does the number arrays of "all other arrays in the 'parent' array" grow? Could it go from comparing one array to two other arrays (as your code above shows) to comparing one array to n other arrays?
  3. As your application scales, does the number of elements in "all the other arrays in the 'parent' array" grow? In other words, when comparing to these other arrays, could each other array go from containing, say, 20 elements to containing n elements?

Count the number of "yes" answers to the questions above. They represent your n's. Three "yes" answers indicates O(n^3).

A bit more on Big O. As counter-intuitive as it may seem, comparing an array with 500 elements to another array of 500 elements, if neither array ever changes, has a time complexity of O(1). Even though 25,000 comparisons must be made, since the number of comparisons never changes, the complexity reduces to 1. This begins to make sense when you start to imagine the possibilities for caching results, sorting, or choosing very efficient data structures or search algorithms.

Now imagine checking a document of variable size against a document of fixed size (e.g. a spell checker checks a document against all 171,000 words in the Oxford English Dictionary). The document has n words, but surely the 171,000 words in the OED must have some impact on the time complexity?

Actually, no. The complexity would be O(n), because the only variable is the size of the document. This also begins to make sense when you read about some data structures that make word lookup very fast against a known word list (e.g. a Trie). The time to check a document scales linearly with the size of the document.

We would get O(n^2) complexity if we were comparing, say, a document of size n against another document of size n. In this case, comparing two documents of size 1000 would not scale linearly when the document size increases to 10,000,000,000; if the size of the documents was a variable expected to grow, we would likely have to rethink our approach.

like image 27
Nate Vaughan Avatar answered Mar 21 '23 05:03

Nate Vaughan