java string permutations and combinations lookup

Tags:

I'm writing an Android word app. My code includes a method that would find all combinations of the string and the substrings of a 7 letter string with a minimum of length 3. Then compare all available combination to every word in the dictionary to find all the valid words. I'm using a recursive method. Here's the code.

// Gets all the permutations of a string.
void permuteString(String beginningString, String endingString) {
    if (endingString.length() <= 1){
        if((Arrays.binarySearch(mDictionary, beginningString.toLowerCase() +   endingString.toLowerCase())) >= 0){
            mWordSet.add(beginningString + endingString);
        }
    }
    else
        for (int i = 0; i < endingString.length(); i++) {
            String newString = endingString.substring(0, i) + endingString.substring(i + 1);
            permuteString(beginningString + endingString.charAt(i), newString);
      }
}
// Get the combinations of the sub-strings. Minimum 3 letter combinations
void subStrings(String s){
    String newString = "";
    if(s.length() > 3){
        for(int x = 0; x < s.length(); x++){
            newString = removeCharAt(x, s);
            permuteString("", newString);
            subStrings(newString);
        }
    }
}

The above code runs fine but when I installed it on my Nexus s I realized that it runs a bit too slow. It takes a few seconds to complete. About 3 or 4 seconds which is unacceptable. Now I've played some word games on my phone and they compute all the combinations of a string instantly which makes me believe that my algorithm is not very efficient and it can be improved. Can anyone help?

public class TrieNode {
TrieNode a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z;
TrieNode[] children = {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z};
private ArrayList<String> words = new ArrayList<String>();

public void addWord(String word){
    words.add(word);
}
public ArrayList<String> getWords(){
    return words;
}
}

public class Trie {

static String myWord;
static String myLetters = "afinnrty";
static char[] myChars;
static Sort sort;
static TrieNode myNode = new TrieNode();
static TrieNode currentNode;
static int y = 0;
static ArrayList<String> availableWords = new ArrayList<String>();

public static void main(String[] args) {

    readWords();
    getPermutations();
}
public static void getPermutations(){
    currentNode = myNode;
    for(int x = 0; x < myLetters.length(); x++){
        if(currentNode.children[myLetters.charAt(x) - 'a'] != null){
            //availableWords.addAll(currentNode.getWords());
            currentNode = currentNode.children[myLetters.charAt(x) - 'a'];
            System.out.println(currentNode.getWords() + "" + myLetters.charAt(x));
        }
    }
    //System.out.println(availableWords);
}
public static void readWords(){
    try {
        BufferedReader in = new BufferedReader(new FileReader("c://scrabbledictionary.txt"));
        String str;
        while ((str = in.readLine()) != null) {
            myWord = str;
            myChars = str.toCharArray();
            sort = new Sort(myChars);
            insert(myNode, myChars, 0);
        }
        in.close();
    } catch (IOException e) {
    }
}
public static void insert(TrieNode node, char[] myChars, int x){    
    if(x >= myChars.length){
        node.addWord(myWord);
        //System.out.println(node.getWords()+""+y);
        y++;
        return;
    }
    if(node.children[myChars[x]-'a'] == null){
        insert(node.children[myChars[x]-'a'] = new TrieNode(), myChars, x=x+1);
    }else{
        insert(node.children[myChars[x]-'a'], myChars, x=x+1);
    }
}
}

624

asked Feb 04 '12 03:02

zataar

1 Answers

In your current approach, you're looking up every permutation of each substring. So for "abc", you need to look up "abc", "acb", "bac", "bca", "cab" and "cba". If you wanted to find all permutations of "permutations", your number of lookups is nearly 500,000,000, and that's before you've even looked at its substrings. But we can reduce this to one lookup, regardless of length, by preprocessing the dictionary.

The idea is to put each word in the dictionary into some data structure where each element contains a set of characters, and a list of all words containing (only) those characters. So for example, you could build a binary tree, which would have a node containing the (sorted) character set "abd" and the word list ["bad", "dab"]. Now, if we want to find all permutations of "dba", we sort it to give "abd" and look it up in the tree to retrieve the list.

As Baumann pointed out, tries are well suited to storing this kind of data. The beauty of the trie is that the lookup time depends only on the length of your search string - it is independent of the size of your dictionary. Since you'll be storing quite a lot of words, and most of your search strings will be tiny (the majority will be the 3-character substrings from the lowest level of your recursion), this structure is ideal.

In this case, the paths down your trie would reflect the character sets rather than the words themselves. So if your entire dictionary was ["bad", "dab", "cab", "cable"], your lookup structure would end up looking like this:

Example trie

There's a bit of a time/space trade-off in the way you implement this. In the simplest (and fastest) approach, each Node contains just the list of words, and an array Node[26] of children. This allows you to locate the child you're after in constant time, just by looking at children[s.charAt(i)-'a'] (where s is your search string and i is your current depth in the trie).

The downside is that most of your children arrays will be mostly empty. If space is an issue, you can use a more compact representation like a linked list, dynamic array, hash table, etc. However, these come at the cost of potentially requiring several memory accesses and comparisons at each node, instead of the simple array access above. But I'd be surprised if the wasted space was more than a few megabytes over your whole dictionary, so the array-based approach is likely your best bet.

With the trie in place, your whole permutation function is replaced with one lookup, bringing the complexity down from O(N! log D) (where D is the size of your dictionary, N the size of your string) to O(N log N) (since you need to sort the characters; the lookup itself is O(N)).

EDIT: I've thrown together an (untested) implementation of this structure: http://pastebin.com/Qfu93E80

127

answered Oct 26 '22 18:10

Nick Barnes

Related questions
                            
                                want to create a map<String , Object> ,Object can be String and can be class Object
                            
                                Algorithm to generate random number of size X
                            
                                Hazelcast dedicated nodes
                            
                                error - java.lang.IllegalArgumentException: URI scheme is not "file"?
                            
                                How can I store Java types, allowing only some specific ones?
                            
                                JFileChooser from a command line program and popping up Underneath all windows
                            
                                Creating an immutable object without final fields?
                            
                                get absolute path to apache webcontents folder from a java class file [duplicate]
                            
                                FindBugs RV_ABSOLUTE_VALUE_OF_RANDOM_INT warning
                            
                                How do I programmatically set the time zone in Java?
                            
                                What's the proper way to use reflection to instantiate objects of unknown classes at runtime?
                            
                                Java reflection: How to get methods with no parameters only
                            
                                Struts 2 "%" sign and '#" sign in OGNL
                            
                                Why the Java language need bytecode? Why java design in this way? [closed]
                            
                                Copy file to assets folder
                            
                                How do I replace an element?
                            
                                Make column read-only using apache poi
                            
                                Downgrade Eclipse ant to compile while using older JDK libraries (Java 1.7 -> 1.6)
                            
                                Compare date's date part only with Timestamp in Hibernate
                            
                                Setting Up Maven 3 properly

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

java string permutations and combinations lookup

Tags:

java

algorithm

permutation

combinations

zataar

People also ask

1 Answers

Nick Barnes

Recent Activity

Donate For Us