How Can I Speed Up This Anagram Algorithm

Tags:

I am making a mobile app to find anagrams and partial matches. Mobile is important because there is not a whole lot of computational power, and efficiency is key.

The algorithm takes any number of letters, including repeats, and finds the longest words made up from its letters, using every letter only once. I am also interested in finding the top results quickly, and am not really concerned with the bottoms (shorter ones) as long as N is met. For example:

STACK => stack, tacks, acts, cask, cast, cats…

I have done some googling and have found a few algorithms, and I came up with one which I thought would be efficient, but is not as efficient as I would like.

I have a lookup dictionary pre-made that maps a sorted key to the real words that generate that key.

"aelpp" => ["apple", "appel", "pepla"]

I have further split each dictionary into different ones based on the length of the key. So keys that are 5 letters long are in one dictionary, keys that are 6 are in another. Each of these dictionaries are in an array in which the index is the length of the keys that are found in the dictionary.

anagramArray[5] => dictionary5
dictionary5["aelpp"] => ["apple", "appel", "pepla"]

My algorithm starts by taking an input word "lappe", and it sorts it:

"lappe" => "aelpp"

Now, for each dictionary that has contents of at most 5 letters, I do a comparison to pull it out. Here is pseudocode:

word = input.sort
for (i = word.length; i > 0; i--)
    dictionaryN = array[i]
    for (key in dictionaryN)
        if word matches key
            add to returnArray
        end
    end
    if returnArray count > N
      break
    end
end

returnArray.sort by longest word, alphabetize

The dictionary only has about 170,000 words in it, but searches are taking up to 20 seconds for 12 letter inputs. My match method makes a regex out of the key:

"ackst" => /a.*c.*k.*s.*t.*/

such that, for example, a 4 letter key such as acst (acts), will match ackst (stack) because:

"ackst" matches /a.*c.*s.*t.*/

I have seen other apps do the same thing in much less time, and I am wondering if my approach is garbage or just needs some tweaking.

How can I get the maximum computational efficiency for generating the top N anagrams from a word, sorted by max length?

641

asked Jul 02 '11 04:07

coneybeare

1 Answers

If you think of (and perhaps even represent) a dictionary as a tree of letters you can avoid looking at lots of nodes. If "stack" is in the dictionary, then there will be a path from the root to a leaf labelled a-c-k-s-t. If the input word is "attacks" then sort this to get aackstt. You can write a recursive routine to follow links down from the root, consuming letters from aackstt as you go. When you reach a-c-k you will have stt left in your string, so you can follow the s to reach ackst, but you can rule out following u to reach a-c-k-u and its descendants, v to reach a-c-k-v and its descendants, and so on.

In fact, with this scheme, you could use just one tree to hold words of any number of letters, which should save you doing multiple searches, one for each target length.

196

answered Sep 18 '22 22:09

mcdowella

Related questions
                            
                                Best approach to a variation of a bucketing problem
                            
                                Maximum average distance between two numbers across multiple arrays
                            
                                Number of ways to form array using only 3 elements?
                            
                                Mapping a set of 3D points to another set with minimal sum of distances
                            
                                Prediction Market algorithm
                            
                                Algorithm for 2D Interpolation
                            
                                programmatically get BPM of a wave or MP3 from .Net
                            
                                Merge overlapping triangles into a polygon
                            
                                Covariance matrix computation
                            
                                Cosine Similarity of Vectors, with < O(n^2) complexity
                            
                                given a set of conditions, algorithmically determine only one can be True
                            
                                How VIM Autocomplete Work
                            
                                Ideal hashing method for wide distribution of values?
                            
                                Generate all possible numpad/keypad sequences
                            
                                Distance between two convex polygons in 3D
                            
                                What's the best way to inherit properties in a tree-based structure?
                            
                                Using the Levenshtein distance in a spell checker
                            
                                Trilateration of a signal using Time Difference of Arrival
                            
                                Binary Search Tree Traversal - PreOrder
                            
                                parsing words in a continuous string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How Can I Speed Up This Anagram Algorithm

Tags:

performance

algorithm

logic

anagram

coneybeare

People also ask

1 Answers

mcdowella

Recent Activity

Donate For Us