This is an interview question: Find all (english word) substrings of a given string. (every = every, ever, very). Obviously, we can loop over all substrings and check each one against an English dictionary, organized as a set. I believe the dictionary is small enough to fit the RAM. How to organize the dictionary ? As for as I remember, the original <code>spell</code> command loaded the <code>words</code> file in a <code>bitmap</code>, represented a set of words hash values. I would start from that. Another solution is a <code>trie</code> built from the dictionary. Using the trie we can loop over all string characters and check the <code>trie</code> for each character. I guess the complexity of this solution would be the same in the worst case (<code>O(n^2)</code>) Does it make sense? Would you suggest other solutions?

The Aho-Corasick string matching algorithm which "constructs a finite state machine that resembles a trie with additional links between the various internal nodes." But everything considered the "build a trie from the English dictionary and do a simultaneous search on it for all suffixes of the given string" should be pretty good for an interview.

Find all (english word) substrings of a given string

Tags:

algorithm

data-structures

This is an interview question: Find all (english word) substrings of a given string. (every = every, ever, very).

Obviously, we can loop over all substrings and check each one against an English dictionary, organized as a set. I believe the dictionary is small enough to fit the RAM. How to organize the dictionary ? As for as I remember, the original spell command loaded the words file in a bitmap, represented a set of words hash values. I would start from that.

Another solution is a trie built from the dictionary. Using the trie we can loop over all string characters and check the trie for each character. I guess the complexity of this solution would be the same in the worst case (O(n^2))

Does it make sense? Would you suggest other solutions?

440

asked Mar 02 '11 18:03

Michael

1 Answers

The Aho-Corasick string matching algorithm which "constructs a finite state machine that resembles a trie with additional links between the various internal nodes."
But everything considered the "build a trie from the English dictionary and do a simultaneous search on it for all suffixes of the given string" should be pretty good for an interview.

124

answered Sep 21 '22 13:09

Eugen Constantin Dinca

Related questions
                            
                                Image sharpness metric
                            
                                Knapsack: how to add item type to existing solution
                            
                                Offset/limit to page/size conversion
                            
                                Lexicographic minimum permutation such that all adjacent letters are distinct
                            
                                How to efficiently determine if a set of points contains two that are close
                            
                                Efficiently finding the largest surrounding square in 2D grid
                            
                                Get a sublist from an ArrayList efficiently
                            
                                Reordering a list to maximize difference of adjacent elements
                            
                                Bit operation used in a for loop
                            
                                Location based horizontal scalable dating app database model
                            
                                Optimized argmin: an effective way to find an item minimizing a function
                            
                                Hash Function For Sequence of Unique Ids (UUID)
                            
                                Efficient algorithm to randomly select items with frequency
                            
                                incremental k-core algorithm
                            
                                Fast Average Square Difference Function
                            
                                Detecting self crossing in closed Bezier curves
                            
                                Determining if a sphere intersects an object or not
                            
                                Algorithm/approximation for combined independent set/hamming distance
                            
                                Programmatical approach in Java for file comparison
                            
                                Dynamic Programming Algorithm for Segmented Least Squares

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With