Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Big O Notation for string matching algo

What would the big O notation of the function foo be?

int foo(char *s1, char *s2)
{
   int c=0, s, p, found;
   for (s=0; s1[s] != '\0'; s++)
   {
      for (p=0, found=0; s2[p] != '\0'; p++)
      {
         if (s2[p] == s1[s])
         {
            found = 1;
            break;
         }
      }
      if (!found) c++;
   }
   return c;
}

What is the efficiency of the function foo?

a) O(n!)

b) O(n^2)

c) O(n lg(base2) n )

d) O(n)

I would have said O(MN)...?

like image 281
NightWolf Avatar asked Jun 16 '11 13:06

NightWolf


People also ask

Which is the best string matching algorithm?

The so-called naive or brute force algorithm is the most intuitive approach to the string pattern-matching problem. This algorithm attempts simply to match the pattern in the target at successive positions from left to right.

What are strings write an algorithm for string pattern matching?

Exact String Matching Algorithms: Exact string matching algorithms is to find one, several, or all occurrences of a defined string (pattern) in a large string (text or sequences) such that each matching is perfect. All alphabets of patterns must be matched to corresponding matched subsequence.


1 Answers

It is O(n²) where n = max(length(s1),length(s2)) (which can be determined in less than quadratic time - see below). Let's take a look at a textbook definition:

f(n) ∈ O(g(n)) if a positive real number c and positive integer N exist such that f(n) <= c g(n) for all n >= N

By this definition we see that n represents a number - in this case that number is the length of the string passed in. However, there is an apparent discrepancy, since this definition provides only for a single variable function f(n) and here we clearly pass in 2 strings with independent lengths. So we search for a multivariable definition for Big O. However, as demonstrated by Howell in "On Asymptotic Notation with Multiple Variables":

"it is impossible to define big-O notation for multi-variable functions in a way that implies all of these [commonly-assumed] properties."

There is actually a formal definition for Big O with multiple variables however this requires extra constraints beyond single variable Big O be met, and is beyond the scope of most (if not all) algorithms courses. For typical algorithm analysis we can effectively reduce our function to a single variable by bounding all variables to a limiting variable n. In this case the variables (specifically, length(s1) and length(s2)) are clearly independent, but it is possible to bound them:

Method 1

Let x1 = length(s1)
Let x2 = length(s2)

The worst case scenario for this function occurs when there are no matches, therefore we perform x1 * x2 iterations.

Because multiplication is commutative, the worst case scenario foo(s1,s2) == the worst case scenario of foo(s2,s1). We can therefore assume, without loss of generality, that x1 >= x2. (This is because, if x1 < x2 we could get the same result by passing the arguments in the reverse order).

Method 2 (in case you don't like the first method)

For the worst case scenario (in which s1 and s2 contain no common characters), we can determine length(s1) and length(s2) prior to iterating through the loops (in .NET and Java, determining the length of a string is O(1) - but in this case it is O(n)), assigning the greater to x1 and the lesser to x2. Here it is clear that x1 >= x2.

For this scenario, we will see that the extra calculations to determine x1 and x2 make this O(n² + 2n) We use the following simplification rule which can be found here to simplify to O(n²):

If f(x) is a sum of several terms, the one with the largest growth rate is kept, and all others omitted.

Conclusion

for n = x1 (our limiting variable), such that x1 >= x2, the worst case scenario is x1 = x2. Therefore: f(x1) ∈ O(n²)

Extra Hint

For all homework problems posted to SO related to Big O notation, if the answer is not one of:

O(1)
O(log log n)
O(log n)
O(n^c), 0<c<1
O(n)
O(n log n) = O(log n!)
O(n^2)
O(n^c)
O(c^n)
O(n!)

Then the question is probably better off being posted to https://math.stackexchange.com/

like image 167
Ethan Cabiac Avatar answered Nov 15 '22 15:11

Ethan Cabiac