I am trying to implement a program that will take a users input, split that string into tokens, and then search a dictionary for the words in that string. My goal for the parsed string is to have every single token be an English word.
For Example:
Input:
aman
Split Method:
a man
a m an
a m a n
am an
am a n
ama n
Desired Output:
a man
I currently have this code which does everything up until the desired output part:
import java.util.Scanner;
import java.io.*;
public class Words {
public static String[] dic = new String[80368];
public static void split(String head, String in) {
// head + " " + in is a segmentation
String segment = head + " " + in;
// count number of dictionary words
int count = 0;
Scanner phraseScan = new Scanner(segment);
while (phraseScan.hasNext()) {
String word = phraseScan.next();
for (int i=0; i<dic.length; i++) {
if (word.equalsIgnoreCase(dic[i])) count++;
}
}
System.out.println(segment + "\t" + count + " English words");
// recursive calls
for (int i=1; i<in.length(); i++) {
split(head+" "+in.substring(0,i), in.substring(i,in.length()));
}
}
public static void main (String[] args) throws IOException {
Scanner scan = new Scanner(System.in);
System.out.print("Enter a string: ");
String input = scan.next();
System.out.println();
Scanner filescan = new Scanner(new File("src:\\dictionary.txt"));
int wc = 0;
while (filescan.hasNext()) {
dic[wc] = filescan.nextLine();
wc++;
}
System.out.println(wc + " words stored");
split("", input);
}
}
I know there are better ways to store the dictionary (such as a binary search tree or a hash table), but I don't know how to implement those anyway.
I am stuck on how to implement a method that would check the split string to see if every segment was a word in the dictionary.
Any help would be great, Thank you
Splitting the input string every possible way is not going to finish in a reasonable amount of time if you want to support 20 or more characters. Here's a more efficient approach, comments inline:
public static void main(String[] args) throws IOException {
// load the dictionary into a set for fast lookups
Set<String> dictionary = new HashSet<String>();
Scanner filescan = new Scanner(new File("dictionary.txt"));
while (filescan.hasNext()) {
dictionary.add(filescan.nextLine().toLowerCase());
}
// scan for input
Scanner scan = new Scanner(System.in);
System.out.print("Enter a string: ");
String input = scan.next().toLowerCase();
System.out.println();
// place to store list of results, each result is a list of strings
List<List<String>> results = new ArrayList<>();
long time = System.currentTimeMillis();
// start the search, pass empty stack to represent words found so far
search(input, dictionary, new Stack<String>(), results);
time = System.currentTimeMillis() - time;
// list the results found
for (List<String> result : results) {
for (String word : result) {
System.out.print(word + " ");
}
System.out.println("(" + result.size() + " words)");
}
System.out.println();
System.out.println("Took " + time + "ms");
}
public static void search(String input, Set<String> dictionary,
Stack<String> words, List<List<String>> results) {
for (int i = 0; i < input.length(); i++) {
// take the first i characters of the input and see if it is a word
String substring = input.substring(0, i + 1);
if (dictionary.contains(substring)) {
// the beginning of the input matches a word, store on stack
words.push(substring);
if (i == input.length() - 1) {
// there's no input left, copy the words stack to results
results.add(new ArrayList<String>(words));
} else {
// there's more input left, search the remaining part
search(input.substring(i + 1), dictionary, words, results);
}
// pop the matched word back off so we can move onto the next i
words.pop();
}
}
}
Example output:
Enter a string: aman
a man (2 words)
am an (2 words)
Took 0ms
Here's a much longer input:
Enter a string: thequickbrownfoxjumpedoverthelazydog
the quick brown fox jump ed over the lazy dog (10 words)
the quick brown fox jump ed overt he lazy dog (10 words)
the quick brown fox jumped over the lazy dog (9 words)
the quick brown fox jumped overt he lazy dog (9 words)
Took 1ms
If my answer seems silly, it's because you're really close and I'm not sure where you're stuck.
The simplest way given your code above would be to simply add a counter for the number of words and compare that to the number of matched words
int count = 0; int total = 0;
Scanner phraseScan = new Scanner(segment);
while (phraseScan.hasNext()) {
total++
String word = phraseScan.next();
for (int i=0; i<dic.length; i++) {
if (word.equalsIgnoreCase(dic[i])) count++;
}
}
if(total==count) System.out.println(segment);
Implementing this as a hash-table might be better (it's faster, for sure), and it'd be really easy.
HashSet<String> dict = new HashSet<String>()
dict.add("foo")// add your data
int count = 0; int total = 0;
Scanner phraseScan = new Scanner(segment);
while (phraseScan.hasNext()) {
total++
String word = phraseScan.next();
if(dict.contains(word)) count++;
}
There are other, better ways to do this. One is a trie (http://en.wikipedia.org/wiki/Trie) which is a bit slower for lookup but stores data more efficiently. If you have a large dictionary, you might not be able ot fit it in memory, so you could use a database or key-value store like a BDB (http://en.wikipedia.org/wiki/Berkeley_DB)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With