Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterate through only part of a large list in java

I'm trying to make a Boggle game in Java, and for my program once I randomize the board I have a method which iterates through the possible combinations and compares each one to a dictionary list to check if it's a valid word, and if yes, I put it in the key. It works fine, however the program takes three or four minutes to generate the key, which is mostly due to the size of the dictionary. The one I'm using has about 19k words and comparing every combination takes up a ton of time. Here's the part of the code I'm trying to make faster:

if (str.length()>3&&!key.contains(str)&&prefixes.contains(str.substring(0,3))&&dictionary.contains(str)){
        key.add(str);
    }

where str is the combination generated. prefixes is a list I generated based on dictionary that goes like this:

public void buildPrefixes(){
    for (String word:dictionary){
        if(!prefixes.contains(word.substring(0,3))){
            prefixes.add(word.substring(0,3));
        }
    }       
}

which just adds all the three letter prefixes in the dictionary such as "abb" and "mar" so that when str is jibberish like "xskfjh" it won't get checked against the whole dictionary, just prefixes which is something like 1k words.

What I'm trying to do is cut down on time by iterating through only the words in the dictionary that have the same first letter as str, so if str is "abbey" then it will only check str against words that start with "a" instead of the whole list, which would cut down on time significantly. Or even better, it only checks str against words that have the same prefix. I am pretty new to Java so I would really appreciate if you're very descriptive in your answers, thanks!

like image 536
Ham Lingo Avatar asked Oct 30 '22 05:10

Ham Lingo


1 Answers

What comments are trying to say is - do not reinvent wheel. Java is not Assembler or C and it is powerful enough to handle such trivial cases. Here is simple code which shows that simple Set can handle your vocabulary easy:

import java.util.Set;
import java.util.TreeSet;

public class Work {

    public static void main(String[] args) {
        long startTime=System.currentTimeMillis();
        Set<String> allWords=new TreeSet<String>();
        for (int i=0; i<20000;i++){
            allWords.add(getRandomWord());
        }
        System.out.println("Total words "+allWords.size()+" in "+(System.currentTimeMillis()-startTime)+" milliseconds");

    }

    static String getRandomWord() {
        int length=3+(int)(Math.random()*10);
        String r = "";
        for(int i = 0; i < length; i++) {
            r += (char)(Math.random() * 26 + 97);
        }
        return r;
    }
}

On my computer it shows

Total words 19875 in 47 milliseconds

As you can see 125 words out of 20,000 were duplicated. And it took not only time to generate 20,000 words in very inefficient way but store them as well as check for duplicates.

like image 97
Alex Avatar answered Nov 15 '22 04:11

Alex