I was going through some Amazon interview questions on CareerCup.com, and I came across this interesting question which I haven't been able to figure out how to do. I have been thinking on this since 2 days. Either I am taking a way off approach, or its a genuinely hard function to write.
Question is as follows:
Write a function in C that can find if a string is a sub-string of another. Note that a mismatch of one character should be ignored.
A mismatch can be an extra character: ’dog’ matches ‘xxxdoogyyyy’
A mismatch can be a missing character: ’dog’ matches ‘xxxdgyyyy’
A mismatch can be a different character: ’dog’ matches ‘xxxdigyyyy’
The return value wasn't mentioned in the question, so I assume the signature of the function can be something like this:
char * MatchWithTolerance(const char * str, const char * substr);
If there is a match with the given rules, return the pointer to the beginning of matched substring within the string. Else return null.
Bonus
If someone can also figure out a generic way of making the tolerance to n instead of 1, then that would be just brilliant. In that case the signature would be:
char * MatchWithTolerance(const char * str, const char * substr, unsigned int tolerance = 1);
Thanks to all those who would attempt this and share their successful solution.
This seems to work, let me know if you find any errors and I'll try to fix them:
int findHelper(const char *str, const char *substr, int mustMatch = 0)
{
if ( *substr == '\0' )
return 1;
if ( *str == '\0' )
return 0;
if ( *str == *substr )
return findHelper(str + 1, substr + 1, mustMatch);
else
{
if ( mustMatch )
return 0;
if ( *(str + 1) == *substr )
return findHelper(str + 1, substr, 1);
else if ( *str == *(substr + 1) )
return findHelper(str, substr + 1, 1);
else if ( *(str + 1) == *(substr + 1) )
return findHelper(str + 1, substr + 1, 1);
else if ( *(substr + 1) == '\0' )
return 1;
else
return 0;
}
}
int find(const char *str, const char *substr)
{
int ok = 0;
while ( *str != '\0' )
ok |= findHelper(str++, substr, 0);
return ok;
}
int main()
{
printf("%d\n", find("xxxdoogyyyy", "dog"));
printf("%d\n", find("xxxdgyyyy", "dog"));
printf("%d\n", find("xxxdigyyyy", "dog"));
}
Basically, I make sure only one character can differ, and run the function that does this for every suffix of the haystack.
This is related to a classical problem of IT, referred to as Levenshtein distance. See Wikibooks for a bunch of implementations in different languages.
This is slightly different than the earlier solution, but I was intrigued by the problem and wanted to give it a shot. Obviously optimize if desired, I just wanted a solution.
char *match(char *str, char *substr, int tolerance)
{
if (! *substr) return str;
if (! *str) return NULL;
while (*str)
{
char *str_p;
char *substr_p;
char *matches_missing;
char *matches_mismatched;
str_p = str;
substr_p = substr;
while (*str_p && *substr_p && *str_p == *substr_p)
{
str_p++;
substr_p++;
}
if (! *substr_p) return str;
if (! tolerance)
{
str++;
continue;
}
if (strlen(substr_p) <= tolerance) return str;
/* missed due to a missing letter */
matches_missing = match(str_p, substr_p + 1, tolerance - 1);
if (matches_missing == str_p) return str;
/* missed due to a mismatch of letters */
matches_mismatched = match(str_p + 1, substr_p + 1, tolerance - 1);
if (matches_mismatched == str_p + 1) return str;
str++;
}
return NULL;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With