Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting words in order of frequency? (least to greatest)

Tags:

java

sorting

does any one have any idea how to sort a list of words in the order of their frequency (least to greatest) using the built in collection.sort and a comparator<string> interface?

I already have a method that gets the count of a certain word in the text file. Now, I just need to create a method that compares the counts of each word and then puts them in a list sorted by the least frequency to the greatest.

Any ideas and tips would be very much appreciated. I'm having trouble getting started on this particular method.

public class Parser implements Comparator<String> {

    public Map<String, Integer> wordCount;

    void parse(String filename) throws IOException {
        File file = new File(filename);
        Scanner scanner = new Scanner(file);

        //mapping of string -> integer (word -> frequency)
        Map<String, Integer> wordCount = new HashMap<String, Integer>();

        //iterates through each word in the text file
        while(scanner.hasNext()) {
            String word = scanner.next();
            if (scanner.next()==null) {
                wordCount.put(word, 1);
            }
            else {
                wordCount.put(word, wordCount.get(word) + 1);;
                }
            }
            scanner.next().replaceAll("[^A-Za-z0-9]"," ");
            scanner.next().toLowerCase();
        }

    public int getCount(String word) {
        return wordCount.get(word);
    }

    public int compare(String w1, String w2) {
        return getCount(w1) - getCount(w2);
    } 

        //this method should return a list of words in order of frequency from least to   greatest
    public List<String> getWordsInOrderOfFrequency() {
        List<Integer> wordsByCount = new ArrayList<Integer>(wordCount.values());
        //this part is unfinished.. the part i'm having trouble sorting the word frequencies
        List<String> result = new ArrayList<String>();


    }
}
like image 670
user1333781 Avatar asked Jan 17 '23 22:01

user1333781


2 Answers

First of all your usage of scanner.next() seems incorrect. next() will return the next word and move onto next one every time you call it, therefore the following code:

if(scanner.next() == null){ ... }

and also

scanner.next().replaceAll("[^A-Za-z0-9]"," ");
scanner.next().toLowerCase();

will consume and then just throw away words. What you probably want to do is:

String word = scanner.next().replaceAll("[^A-Za-z0-9]"," ").toLowerCase();

at the beginning of your while loop, so that the changes to your word are saved in the word variable, and not just thrown away.

Secondly, the usage of the wordCount map is slightly broken. What you want to do is to check if the word is already in the map to decide what word count to set. To do this, instead of checking for scanner.next() == null you should look in the map, for example:

if(!wordCount.containsKey(word)){
  //no count registered for the word yet
  wordCount.put(word, 1);
}else{
  wordCount.put(word, wordCount.get(word) + 1);
}

alternatively you can do this:

Integer count = wordCount.get(word);
if(count == null){
  //no count registered for the word yet
  wordCount.put(word, 1);
}else{
  wordCount.put(word, count+1);
}

I would prefer this approach, because it's a bit cleaner, and does only one map look-up per word, whereas the first approach sometimes does two look-ups.

Now, to get a list of words in descending order of frequencies, you can convert your map to a list first, then apply Collections.sort() as was suggested in this post. Below is a simplified version suited to your needs:

static List<String> getWordInDescendingFreqOrder(Map<String, Integer> wordCount) {

    // Convert map to list of <String,Integer> entries
    List<Map.Entry<String, Integer>> list = 
        new ArrayList<Map.Entry<String, Integer>>(wordCount.entrySet());

    // Sort list by integer values
    Collections.sort(list, new Comparator<Map.Entry<String, Integer>>() {
        public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
            // compare o2 to o1, instead of o1 to o2, to get descending freq. order
            return (o2.getValue()).compareTo(o1.getValue());
        }
    });

    // Populate the result into a list
    List<String> result = new ArrayList<String>();
    for (Map.Entry<String, Integer> entry : list) {
        result.add(entry.getKey());
    }
    return result;
}

Hope this helps.

Edit: Changed the comparison function as suggested by @dragon66. Thanks.

like image 133
rodion Avatar answered Jan 20 '23 15:01

rodion


You can compare and extract ideas from the following:

public class FrequencyCount {

    public static void main(String[] args) {

        // read in the words as an array
        String s = StdIn.readAll();
        // s = s.toLowerCase();
        // s = s.replaceAll("[\",!.:;?()']", "");
        String[] words = s.split("\\s+");

        // sort the words
        Merge.sort(words);

        // tabulate frequencies of each word
        Counter[] zipf = new Counter[words.length];
        int M = 0;                                        // number of distinct words
        for (int i = 0; i < words.length; i++) {
            if (i == 0 || !words[i].equals(words[i-1]))   // short-circuiting OR
                zipf[M++] = new Counter(words[i], words.length);
            zipf[M-1].increment();
        }

        // sort by frequency and print
        Merge.sort(zipf, 0, M);                           // sorting a subarray
        for (int j = M-1; j >= 0; j--) {
            StdOut.println(zipf[j]);
        }
    }
}
like image 34
UVM Avatar answered Jan 20 '23 14:01

UVM