Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to order an array of Strings by frequency

I have an array of Strings:

String[] stringArray = {"x", "y", "z", "x", "x", "y", "a"};

What is the quickest/most efficient way to order this into a smaller Collection in order of how frequent each String is with its frequency?

I though about using the String as a key in a HashMap<String,Integer> but this wouldnt be sorted in terms of frequency

My other method i considered is using a TreeMap<Integer, String[]> with a list of Strings with that integer, but there seems a lot of checking involved..

Im trying to avoid using more than one loop If possible as my String arrays will be much larger than the one above. Thanks!

EDIT What i want is just to be able to output the Strings in order of frequency and preferably be able to pair that String with its frequency in the array, So for example two output arrays:

["x", "y", "z", "a"]
[3,2,1,1]

This would be quite a simple problem if speed wasnt an issue which is why i ask the great minds on here :)

like image 230
Eduardo Avatar asked Sep 06 '13 14:09

Eduardo


People also ask

How do you sort a string by frequency?

Given a string str, the task is to sort the string according to the frequency of each character, in ascending order. If two elements have the same frequency, then they are sorted in lexicographical order. f, o, r occurs one time so they are ordered lexicographically and so are g, k and s.

Which method can you use to sort an array of strings?

To sort an array of strings in Java, we can use Arrays. sort() function.

Which of the sorting method sorts the values based on its frequency?

The frequency sort algorithm is used to output elements of an array in descending order of their frequencies. If two elements have the same frequencies, then the element that occurs first in the input is printed first.


2 Answers

You can solve this in two steps:

  1. Create a counter object - a Map<String, Integer> listing for each string the number of times it appears in the input: in other words, it's a frequency map. This is O(n), as you only need to traverse the input once for building the map

  2. With the previous map, create a list with its keys, sorted using the frequency of items (the values in the map) as ordering criteria. This is O(n log n), and you can call Collections.sort(), with a Comparator that uses the string frequency for the comparisons

This is what I mean:

String[] stringArray = {"x", "y", "z", "x", "x", "y", "a"};

final Map<String, Integer> counter = new HashMap<String, Integer>();
for (String str : stringArray)
    counter.put(str, 1 + (counter.containsKey(str) ? counter.get(str) : 0));

List<String> list = new ArrayList<String>(counter.keySet());
Collections.sort(list, new Comparator<String>() {
    @Override
    public int compare(String x, String y) {
        return counter.get(y) - counter.get(x);
    }
});

After the above code executes, the variable list will contain the following values (the order between elements of the same frequency is unspecified):

[x, y, a, z]

It's trivial to convert the list to an array:

list.toArray(new String[list.size()])

And if you need to find out the frequency of each string, just iterate over the sorted keys:

for (String str : list) {
    int frequency = counter.get(str);
    System.out.print(str + ":" + frequency + ", ");
}
like image 149
Óscar López Avatar answered Oct 19 '22 18:10

Óscar López


Use the HashMap<String,Integer> to maintain your counts. This will be the most efficient way to process the arbitrary list of strings.

Create an ArrayList<Map.Entry<String,Integer>> from the map's entrySet().

Sort this list using a Collections.sort() and a custom comparator.

Don't get hung up on micro-optimizations.

like image 25
parsifal Avatar answered Oct 19 '22 17:10

parsifal