I'm writing a Java program that parses all the words from a text file and then adds them to a HashMap. I need to count how many distinct words are contained in the file. I also need to figure out the highest counted words. The HashMap is comprised of each word mapped to an integer which represents how many times the word occurs.
Is there something like HashMap that will help me sort this?
The Manual way to do it is as follows:
word
and count
fields.You could use a HashMultiset from google-collections:
import com.google.common.collect.*;
import com.google.common.collect.Multiset.Entry;
...
final Multiset<String> words = HashMultiset.create();
words.addAll(...);
Ordering<Entry<String>> byIncreasingCount = new Ordering<Entry<String>>() {
@Override public int compare(Entry<String> a, Entry<String> b) {
// safe because count is never negative
return left.getCount() - right.getCount();
}
});
Entry<String> maxEntry = byIncreasingCount.max(words.entrySet())
return maxEntry.getElement();
EDIT: oops, I thought you wanted only the single most common word. But it sounds like you want the several most common -- so, you could replace max
with sortedCopy
and now you have a list of all the entries in order.
To find the number of distinct words: words.elementSet().size()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With