Reading an ArrayList Faster

Question

I am writing a solver for a crossword puzzle that reads in a dictionary file and given a pattern returns a list of all words that fit that pattern. I have the functionality working but I need this to work faster. I create a HashMap with the length of words being the key and an ArrayList of the words as the value. Is there someway I can read through the ArrayList faster or is there some better data structure to use?

import java.util.*;

public class CWSolution {

    //create the data structure that will store the dictionary
    private HashMap<Integer,ArrayList<String>> db;

    public CWSolution(List<String> allWords)
    {

    //construct the background structure

        //Create hashmap
        db = new HashMap<Integer,ArrayList<String>>();

        //go through each word
        for(String item : allWords ){
            //if the db does not contain a listing for this word length, create one
            if(!db.containsKey(item.length())){
                ArrayList<String> temp = new ArrayList<String>();
                temp.add(item);
                db.put(item.length(), temp);
            }
            //if it does contain a listing for this word length, add this word to it
            else{
                ArrayList<String> temp = db.get(item.length());
                temp.add(item);
                db.put(item.length(), temp);
            }
        }
    }

    public List<String> solutions(String pattern, int maxRequired)
    {

        //actually look for each pattern

        //create the structures we need
        List<String> answer = new ArrayList<String>();

        //get the relevant array list
        ArrayList<String> temp = db.get(pattern.length());

        //go through the array list word by word
        for(String item : temp ){
            //see if the pattern matches the word, if it does add it to the list, otherwise skip it
            if(matchPattern(pattern, item)){
                answer.add(item);
            }
            //if we reach the required size return it, otherwise keep going
            if(answer.size() == maxRequired){
                return answer;
            }
        }
        return answer;
    }

    private boolean matchPattern(String pattern, String word){
        //TODO implement this function
        //check the word against the pattern
        char star = "*".charAt(0);
        for(int i=0;i<pattern.length();i++){
            if(pattern.charAt(i) != star){
                if(pattern.charAt(i) != word.charAt(i)){
                    return false;
                }
            }
        }
        return true;
    }
}

EDIT: Adding some more information to make this more clear.

Some of the comments were debating this so figured I'd clarify, I am a student in a Data Structures course so there is only so much that I know about, but we are nearing the end of the semester so I have a good idea of basic data structures.

Furthermore I am not as concerned about optimizing the CWSolution() method a I am with optimizing the solutions() method. The speed is being tested as follows and what I am really concerned with is Time2. That is how long it takes to find matching words rather than how long it takes to create the structure.

import java.util.Date;
import java.util.List;


public class CWSpeedTest {

    public static void main(String[] args){
    try{
        FileParser fp = new FileParser("TWL06.txt");
        List<String> solutions = null;
        //Change this to change the pattern
        String pattern = "*S**"; 
        //Change this to change the max solutions
        int maxSolns = 2000;

        List<String> dict = fp.getAllWords();

        Date d1 = new Date();
        CWSolution c = new CWSolution(dict);
        Date d2 = new Date();
        for (int i = 0; i < 1000; i++)
        solutions = c.solutions(pattern,maxSolns);
        Date d3 = new Date();
        System.out.println("Time 1: " + (d2.getTime() - d1.getTime()));
        System.out.println("Time 2: " + (d3.getTime() - d2.getTime()));
        System.out.println("For the pattern: " + pattern);
        System.out.println("With max solutions: " + maxSolns);
        System.out.println(solutions);

    }catch (Exception e){
        e.printStackTrace();
    }
    }
}

Boris Brodski · Accepted Answer

Here is complete rewrite of the algorithm using indexing on all positions and characters. First this algorithm finds the shortest list of the words with a specified character at the specified position found in the pattern. Then it calculates cross section with all other lists of the words (one list per non-star character in the pattern).

import java.util.*;

public class CWSolution {

    class FixLengthDB {
        // Index -> Letter -> All word with the Letter at Index
        HashMap<Integer, HashMap<Character, Set<String>>> indexLetterDb = new HashMap<>();

        public void storeWord(String word) {
            int l = word.length();
            for (int i = 0; i < l; i++) {
                HashMap<Character, Set<String>> letterDb = indexLetterDb.get(i);
                if (letterDb == null) {
                    letterDb = new HashMap<>();
                    indexLetterDb.put(i, letterDb);
                }

                Set<String> list = letterDb.get(word.charAt(i));
                if (list == null) {
                    list = new HashSet<>();
                    letterDb.put(word.charAt(i), list);
                }

                list.add(word);
            }
        }

        public Set<String> getList(int i, char c) {
            HashMap<Character, Set<String>> letterDb = indexLetterDb.get(i);
            if (letterDb == null) {
                return null;
            }
            return letterDb.get(c);
        }
    }

    //create the data structure that will store the dictionary
    private HashMap<Integer,FixLengthDB> db = new HashMap<>();
    private List<String> allWords;

    public CWSolution(List<String> allWords)
    {

        //construct the background structure

        this.allWords = allWords;
        //go through each word
        for(String item : allWords) {
            FixLengthDB fixLengthDB = db.get(item.length());

            if (fixLengthDB == null) {
                fixLengthDB = new FixLengthDB();
                db.put(item.length(), fixLengthDB);
            }

            fixLengthDB.storeWord(item);
        }
    }

    public List<String> solutions(String pattern, int maxRequired)
    {

        FixLengthDB fixLengthDB = db.get(pattern.length());
        if (fixLengthDB == null) {
            return new ArrayList<>();
        }

        Set<String> shortList = null;
        int shortListIndex = 0;
        int l = pattern.length();
        for (int i = 0; i < l; i++) {
            if (pattern.charAt(i) == '*') {
                continue;
            }
            Set<String> set = fixLengthDB.getList(i, pattern.charAt(i));
            if (set == null) {
                return new ArrayList<>();
            }

            if (shortList == null || shortList.size() > set.size()) {
                shortList = set;
                shortListIndex = i;
            }
        }

        if (shortList == null) {
            return allWords;
        }

        HashSet<String> result = new HashSet<>(shortList);
        for (int i = 0; i < l; i++) {
            if (i == shortListIndex || pattern.charAt(i) == '*') {
                continue;
            }
            Set<String> set = fixLengthDB.getList(i, pattern.charAt(i));
            result.retainAll(set);
        }

            // TODO truncate result list according to 'maxRequired' parameter
    return new ArrayList<>(result);
    }
}

Explanation: The algorithm works in two steps

Build an index (in the constructor)
Use index to find matched words (solutions(...))

Build index: The index maintain the sets of string for each word-length/character-index/character.

Here how we add words to the index

Add word: fun
          |||
          ||\--- (length: 3, position 3, character 'n') -> set{"fun"})
          |\---- (length: 3, position 2, character 'u') -> set{"fun"})
          \----- (length: 3, position 1, character 'f') -> set{"fun"})

Add word: run
          |||
          ||\--- (length: 3, position 3, character 'n') -> set{"fun, run"})
          |\---- (length: 3, position 2, character 'u') -> set{"fun, run"})
          \----- (length: 3, position 1, character 'r') -> set{"run"})

Add word: raw
          |||
          ||\--- (length: 3, position 3, character 'w') -> set{"raw"})
          |\---- (length: 3, position 2, character 'a') -> set{"raw"})
          \----- (length: 3, position 1, character 'r') -> set{"run, raw"})

Add word: rar
          |||
          ||\--- (length: 3, position 3, character 'r') -> set{"rar"})
          |\---- (length: 3, position 2, character 'a') -> set{"raw, rar"})
          \----- (length: 3, position 1, character 'r') -> set{"run, raw, rar"})

The database after adding four words (fun, run, raw, rar) is

(length: 3, position 1, character 'f') -> set{"fun"})
(length: 3, position 1, character 'r') -> set{"run, raw, rar"})
(length: 3, position 2, character 'u') -> set{"fun, run"})
(length: 3, position 2, character 'a') -> set{"raw, rar"})
(length: 3, position 3, character 'w') -> set{"raw"})
(length: 3, position 3, character 'r') -> set{"rar"})
(length: 3, position 3, character 'n') -> set{"fun, run"})

Use index: How lets match the pattern ru*

First lets find the matching smallest set in the index. We have only 2 non-star characters, so we checking only two sets

1: (length: 3, position 1, character 'r') -> set{"run, raw, rar"})
2: (length: 3, position 2, character 'u') -> set{"fun, run"})

The smallest set is #2 {"fun, run"}. Now we iterate through all other matching sets (in our case the set #1) and calculate intersections:

{"fun, run"} cross {"run, raw, rar"} => {"run"}

The result is {"run"}.

Darius X. · Answer

If you want to optimize for look-up speed, your ideal solution is where you take all that you know (everything about the pattern), and it gives you only the set of words that matches. Realistically, you are going to use some of what you know to narrow down to a set that is of acceptable size.

In the original code, you are using only one item that you know (i.e. the length) as the key. Millimoose's comments provide the correct answer: create a key that is more discriminating. For instance, suppose you had a two-field key: (Length, Character-Contained)... i.e. 1A, 1B, 1C,... 1Z, 2A... If each pointed to a set, each set would be smaller. You could then use length and any letter from your pattern to get to that set.

Taking it one step further, you could have Millimoose's a three-field key (Length, Position, Character). That way, you take any letter from your pattern, and using those three attributes you can narrow it down to an even smaller list. [As Millimoose points out, what is slowing you down is the String comparisons.]

In theory, you could go all the way and have a set for each possible pattern. For instance, the word "man" would fit patterns "m**","ma*","m*n","*a*","ma*","*an","**n","m*n" and "*an". Each of those could be the key in a map which points to a list (value) that contains the word "man". For instance, "m**" would point to a list that contained "man", "mob", "map", "mid" and so on.

As you do this, you might end up spending too much time initially, when you are constructing the data-structure. Also, you might end up not having enough space to save that data-structure. Those are the trade-offs.

In summary, from your current-key, consider adding more information to the key, while weighing the costs of doing so.

Reading an ArrayList Faster

Tags:

java

performance

arraylist

Lesha

2 Answers

Boris Brodski

Darius X.

Recent Activity

Donate For Us

Reading an ArrayList Faster

Tags:

java

performance

arraylist

Lesha

2 Answers

Boris Brodski

Darius X.

Related questions

Recent Activity

Donate For Us