Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parsing words in a continuous string

If a have a string with words and no spaces, how should I parse those words given that I have a dictionary/list that contains those words?

For example, if my string is "thisisastringwithwords" how could I use a dictionary to create an output "this is a string with words"?

I hear that using the data structure Tries could help but maybe if someone could help with the pseudo code? For example, I was thinking that maybe you could index the dictionary into a trie structure, then follow each char down the trie; problem is, I'm unfamiliar with how to do this in (pseudo)code.

like image 666
locoboy Avatar asked Jun 20 '11 07:06

locoboy


1 Answers

I'm assuming that you want an efficient solution, not the obvious one where you repeatedly check if your text starts with a dictionary word.

If the dictionary is small enough, I think you could try and modify the standard KMP algorithm. Basically, build a finite-state machine on your dictionary which consumes the text character by character and yields the constructed words.

EDIT: It appeared that I was reinventing tries.

like image 66
Zruty Avatar answered Sep 28 '22 06:09

Zruty