In the program I'm currently working on, there's one part that's taking a bit long. Basically, I have a list of Strings and one target phrase. As an example, let's say the target phrase is "inventory of finished goods". Now, after filtering out the stop word (of), I want to extract all Strings from the list that contains one of the three words: "inventory", "finished", and "goods". Right now, I implemented the idea as follows:
String[] targetWords; // contains "inventory", "finished", and "goods"
ArrayList<String> extractedStrings = new ArrayList<String>();
for (int i = 0; i < listOfWords.size(); i++) {
String[] words = listOfWords.get(i).split(" ");
outerloop:
for (int j = 0; j < words.length; j++) {
for (int k = 0; k < targetWords.length; k++) {
if (words[j].equalsIgnoreCase(targetWords[k])) {
extractedStrings.add(listOfWords.get(i));
break outerloop;
}
}
}
}
The list contains over 100k words, and with this it takes rounghly .4 to .8 seconds to complete the task for each target phrase. The things is, I have a lot of these target phrases to process, and the seconds really add up. Thus, I was wondering if anyone knew of a more efficient way to complete this task? Thanks for the help in advance!
Your list of 100k words could be added (once) to a HashSet. Rather than iterating through your list, use wordSet.contains()
- a HashSet gives constant-time performance for this, so not affected by the size of the list.
You can take your giant list of words and add them to a hash map and then when your phrase comes in, just loop over the words in your phrase and check against the hash map. Currently you are doing a linear search and what I'm proposing would cut it down to a constant time search.
The key is minimizing lookups. Using this technique you would be effectively indexing your giant list of words for fast lookups.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With