Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatically sorted by values map in Java

I need to have an automatically sorted-by-values map in Java - so that It keeps being sorted at any time while I'm adding new key-value pairs or update the value of an existing key-value pair, or even delete some entry.

Please also have in mind that this map is going to be really big (100's of thousands, or even 10's of millions of entries in size).

So basically I'm looking for the following functionality:

Supposed that we had a class 'SortedByValuesMap' that implements the aforementioned functionality and we have the following code:

SortedByValuesMap<String,Long> sorted_map = new SortedByValuesMap<String, Long>();
sorted_map.put("apples", 4);
sorted_map.put("oranges", 2);
sorted_map.put("bananas", 1);
sorted_map.put("lemons", 3);
sorted_map.put("bananas", 6);

for (String key : sorted_map.keySet()) {
  System.out.println(key + ":" + sorted_map.get(key));
}

the output should be:

bananas:6
apples:4
lemons:3
oranges:2

In particular, what's really important for me, is to be able to get the entry with the lowest value at any time - using a command like:

smallestItem = sorted_map.lastEntry();

which should give me the 'oranges' entry

EDIT: I am a Java newbie so please elaborate a bit in your answers - thanks

EDIT2: This might help: I am using this for counting words (for those who are familiar: n-grams in particular) in huge text files. So I need to build a map where keys are words and values are the frequencies of those words. However, due to limitations (like RAM), I want to keep only the X most frequent words - but you can't know beforehand which are going to be the most frequent words of course. So, the way I thought it might work (as an approximation) is to start counting words and when the map reaches a top-limit (like 1 mil entries) , the least frequent entry will be deleted so as to keep the map's size to 1 mil always.

like image 559
Alexandros Avatar asked Sep 19 '11 00:09

Alexandros


People also ask

Does map sort automatically Java?

No, HashMap s don't sort their keys automatically.

Can map be sorted on values?

A map is not meant to be sorted, but accessed fast. Object equal values break the constraint of the map. Use the entry set, like List<Map.

Does TreeMap sort automatically?

2. Default Sorting in TreeMap. By default, TreeMap sorts all its entries according to their natural ordering. For an integer, this would mean ascending order and for strings, alphabetical order.

Is map sorted by value or key?

Maps are associative containers that store elements in a mapped fashion. Each element has a key value and a mapped value. No two mapped values can have equal key values. By default, a Map in C++ is sorted in increasing order based on its key.


3 Answers

Keep 2 data structures:

  • A dictionary of words -> count. Just use an ordinary HashMap<String, Long>.
  • An "array" to keep track of order, such that list[count] holds a Set<String> of words with that count.

    I'm writing this as though it were an array as a notational convenience. In fact, you probably don't know an upper bound on the number of occurrences, so you need a resizable data structure. Implement using a Map<Long, Set<String>>. Or, if that uses too much memory, use an ArrayList<Set<String>> (you'll have to test for count == size() - 1, and if so, use add() instead of set(count + 1)).

To increment the number of occurrences for a word (pseudocode):

// assumes data structures are in instance variables dict and arr
public void tally(final String word)
{
    final long count = this.dict.get(word) or 0 if absent;
    this.dict.put(word, count + 1);
    // move word up one place in arr
    this.arr[count].remove(word);   // This is why we use a Set: for fast deletion here.
    this.arr[count + 1].add(word);
}

To iterate over words in order (pseudocode):

for(int count = 0; count < arr.size; count++)
    for(final String word : this.arr[count])
        process(word, count);
like image 127
Mechanical snail Avatar answered Oct 08 '22 17:10

Mechanical snail


How about using additional index or only TreeMap<Long, TreeSet<String>> or TreeMap<Long, String> if Long values are distinct?

You can also write a Heap.

like image 2
NiematojakTomasz Avatar answered Oct 08 '22 19:10

NiematojakTomasz


Guava BiMap Solution:

//Prepare original data
BiMap<String, Integer> biMap = HashBiMap.create();
biMap.put("apples" , 4);
biMap.put("oranges", 2);
biMap.put("bananas", 1);
biMap.put("lemons" , 3);
biMap.put("bananas", 6);

//Create a desc order SortedMap
SortedMap<Integer, String> sortedMap = new TreeMap<Integer, String>(new Comparator<Integer>(){
    @Override public int compare(Integer o1, Integer o2) {
      return o2-o1;
}});

//Put inversed map
sortedMap.putAll(biMap.inverse());
for (Map.Entry<Integer, String> e: sortedMap.entrySet()) {
      System.out.println(e);
}
System.out.println(sortedMap.lastKey()); 
like image 1
卢声远 Shengyuan Lu Avatar answered Oct 08 '22 18:10

卢声远 Shengyuan Lu