Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Searching for a "needle" in a two dimnesional "haystack" [closed]

I guess this is one of the most commonly asked interview questions, yet I am unable to solve it in an efficient way(efficient meaning lesser time complexity and use of a suitable Data Structure). The problem is this way: If there is a m x n matrix of chars (say haystack) and a given char string of length k (the needle) . Write a program to check if the haystack contains the needle. Please note that we need to search the haystack only top to down or left to right. For example

Haystack

ahydsfd
sdflddl
dfdfd
dfdl
uifddffdhc

Needle:
hdffi

Output:
Yes Found!!
like image 331
hytriutucx Avatar asked Mar 01 '12 10:03

hytriutucx


1 Answers

The naive bruteforce is O(m*n*k). Here are some ideas for optimization.

Single Search
Instead of doing a search for horizontals and then another for verticals, do both simultaneously. Every time you find an occurrence of the first letter of the needle look for a horizontal and a vertical match starting at that letter. This won't improve the time complexity, but could halve the time since you'll only look at bad starts once.

Rare Letters
Instead of looking for the first letter of the needle, look for the rarest letter which occurs in the needle. This could rule out a lot of the possible matches quickly (though it won't improve the worst-case time complexity). To determine which letters are rarest either scan through the entire board or use random sampling.

Efficient String Searching Algorithm
String searching is a well-studied problem and there are several linear-time algorithms such as Knuth–Morris–Pratt and Boyer–Moore. Using a linear-time string searching algorithm to search each each row and each column reduces the time complexity to O(m*n). This is probably what the interviewers are after.

Exploit Short Rows
I notice that not all rows have the same length. When you look for vertical matches, you can stop searching on that row as soon as the needle 'pops out' of the sack, since all needles further along the row will also exit the sack and therefore cannot match.

like image 193
tom Avatar answered Nov 15 '22 10:11

tom