Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to know whether a string can be segmented into two strings

I was asked in interview following question. I could not figure out how to approach this question. Please guide me.

Question: How to know whether a string can be segmented into two strings - like breadbanana is segmentable into bread and banana, while breadbanan is not. You will be given a dictionary which contains all the valid words.

like image 743
Bhaskar Avatar asked Mar 06 '13 07:03

Bhaskar


People also ask

What is string segmentation?

string segment A substring of a character string that can usually only be replaced by an array of the same size. A Dictionary of Computing. "string segment ."


1 Answers

Build a trie of the words you have in the dictionary, which will make searching faster. Search the tree according to the following letters of your input string. When you've found a word, which is in the tree, recursively start from the position after that word in the input string. If you get to the end of the input string, you've found one possible fragmentation. If you got stuck, come back and recursively try another words.

EDIT: sorry, missed the fact, that there must be just two words. In this case, limit the recursion depth to 2.

The pseudocode for 2 words would be:

T = trie of words in the dictionary
for every word in T, which can be found going down the tree by choosing the next letter of the input string each time we move to the child:
    p <- length(word)
    if T contains input_string[p:length(intput_string)]:
        return true
return false

Assuming you can go down to a child node in the trie in O(1) (ascii indexes of children), you can find all prefixes of the input string in O(n+p), where p is the number of prefixes, and n the length of the input. Upper bound on this is O(n+m), where m is the number of words in dictionary. Checking for containing will take O(w) where w is the length of word, for which the upper bound would be m, so the time complexity of the algorithm is O(nm), since O(n) is distributed in the first phase between all found words.

But because we can't find more than n words in the first phase, the complexity is also limited to O(n^2). So the search complexity would be O(n*min(n, m)) Before that you need to build the trie which will take O(s), where s is the sum of lengths of words in the dictionary. The upper bound on this is O(n*m), since the maximum length of every word is n.

like image 52
Michał Trybus Avatar answered Sep 21 '22 00:09

Michał Trybus