Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match sub-string within a string with tolerance of 1 character mismatch

Tags:

c

string

amazon

I was going through some Amazon interview questions on CareerCup.com, and I came across this interesting question which I haven't been able to figure out how to do. I have been thinking on this since 2 days. Either I am taking a way off approach, or its a genuinely hard function to write.

Question is as follows:

Write a function in C that can find if a string is a sub-string of another. Note that a mismatch of one character should be ignored.

A mismatch can be an extra character: ’dog’ matches ‘xxxdoogyyyy’  
A mismatch can be a missing character: ’dog’ matches ‘xxxdgyyyy’ 
A mismatch can be a different character: ’dog’ matches ‘xxxdigyyyy’

The return value wasn't mentioned in the question, so I assume the signature of the function can be something like this:

char * MatchWithTolerance(const char * str, const char * substr);

If there is a match with the given rules, return the pointer to the beginning of matched substring within the string. Else return null.

Bonus

If someone can also figure out a generic way of making the tolerance to n instead of 1, then that would be just brilliant. In that case the signature would be:

char * MatchWithTolerance(const char * str, const char * substr, unsigned int tolerance = 1);

Thanks to all those who would attempt this and share their successful solution.

like image 568
bits Avatar asked Jul 24 '10 17:07

bits


3 Answers

This seems to work, let me know if you find any errors and I'll try to fix them:

int findHelper(const char *str, const char *substr, int mustMatch = 0)
{
    if ( *substr == '\0' )
        return 1;

    if ( *str == '\0' )
        return 0;

    if ( *str == *substr )
        return findHelper(str + 1, substr + 1, mustMatch);
    else
    {
        if ( mustMatch )
            return 0;

        if ( *(str + 1) == *substr )
            return findHelper(str + 1, substr, 1);
        else if ( *str == *(substr + 1) )
            return findHelper(str, substr + 1, 1);
        else if ( *(str + 1) == *(substr + 1) )
            return findHelper(str + 1, substr + 1, 1);
        else if ( *(substr + 1) == '\0' )
            return 1;
        else
            return 0;
    }
}

int find(const char *str, const char *substr)
{
    int ok = 0;
    while ( *str != '\0' )
        ok |= findHelper(str++, substr, 0);

    return ok;
}


int main()
{
    printf("%d\n", find("xxxdoogyyyy", "dog"));
    printf("%d\n", find("xxxdgyyyy", "dog"));
    printf("%d\n", find("xxxdigyyyy", "dog"));
}

Basically, I make sure only one character can differ, and run the function that does this for every suffix of the haystack.

like image 89
IVlad Avatar answered Sep 29 '22 02:09

IVlad


This is related to a classical problem of IT, referred to as Levenshtein distance. See Wikibooks for a bunch of implementations in different languages.

like image 29
user123444555621 Avatar answered Sep 29 '22 01:09

user123444555621


This is slightly different than the earlier solution, but I was intrigued by the problem and wanted to give it a shot. Obviously optimize if desired, I just wanted a solution.

char *match(char *str, char *substr, int tolerance)
{
  if (! *substr) return str;
  if (! *str) return NULL;

  while (*str)
  {
    char *str_p;
    char *substr_p;
    char *matches_missing;
    char *matches_mismatched;

    str_p = str;
    substr_p = substr;

    while (*str_p && *substr_p && *str_p == *substr_p)
    {
      str_p++;
      substr_p++;
    }
    if (! *substr_p) return str;
    if (! tolerance)
    {
      str++;
      continue;
    }

    if (strlen(substr_p) <= tolerance) return str;

    /* missed due to a missing letter */
    matches_missing = match(str_p, substr_p + 1, tolerance - 1);
    if (matches_missing == str_p) return str;

    /* missed due to a mismatch of letters */
    matches_mismatched = match(str_p + 1, substr_p + 1, tolerance - 1);
    if (matches_mismatched == str_p + 1) return str;

    str++;
  }
  return NULL;
}
like image 35
Brandon Horsley Avatar answered Sep 29 '22 02:09

Brandon Horsley