Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to increment a Map value in Java

I hope this question is not considered too basic for this forum, but we'll see. I'm wondering how to refactor some code for better performance that is getting run a bunch of times.

Say I'm creating a word frequency list, using a Map (probably a HashMap), where each key is a String with the word that's being counted and the value is an Integer that's incremented each time a token of the word is found.

In Perl, incrementing such a value would be trivially easy:

$map{$word}++; 

But in Java, it's much more complicated. Here the way I'm currently doing it:

int count = map.containsKey(word) ? map.get(word) : 0; map.put(word, count + 1); 

Which of course relies on the autoboxing feature in the newer Java versions. I wonder if you can suggest a more efficient way of incrementing such a value. Are there even good performance reasons for eschewing the Collections framework and using a something else instead?

Update: I've done a test of several of the answers. See below.

like image 528
gregory Avatar asked Sep 17 '08 09:09

gregory


People also ask

How does HashMap increase value by 1?

Map<String, Integer> map = new HashMap<>(); String key = "a random key"; int count = map. getOrDefault(key, 0); // ensure count will be one of 0,1,2,3,... map. put(key, count + 1);

Which map is more efficient in Java?

There is no standard small implementation of Map in Java. HashMap is one of the best and most flexible Map implementations around, and is hard to beat.

How do you increment a value inside a map in thread safe?

instead of using putIfAbsent on main map, just use compute function (and do everything inside that function, even the inner map stuff). Whatever you do inside the function will be thread-safe if using ConcurrentHashMap in the root map; no need for synchronization block if you do it this way.


1 Answers

Some test results

I've gotten a lot of good answers to this question--thanks folks--so I decided to run some tests and figure out which method is actually fastest. The five methods I tested are these:

  • the "ContainsKey" method that I presented in the question
  • the "TestForNull" method suggested by Aleksandar Dimitrov
  • the "AtomicLong" method suggested by Hank Gay
  • the "Trove" method suggested by jrudolph
  • the "MutableInt" method suggested by phax.myopenid.com

Method

Here's what I did...

  1. created five classes that were identical except for the differences shown below. Each class had to perform an operation typical of the scenario I presented: opening a 10MB file and reading it in, then performing a frequency count of all the word tokens in the file. Since this took an average of only 3 seconds, I had it perform the frequency count (not the I/O) 10 times.
  2. timed the loop of 10 iterations but not the I/O operation and recorded the total time taken (in clock seconds) essentially using Ian Darwin's method in the Java Cookbook.
  3. performed all five tests in series, and then did this another three times.
  4. averaged the four results for each method.

Results

I'll present the results first and the code below for those who are interested.

The ContainsKey method was, as expected, the slowest, so I'll give the speed of each method in comparison to the speed of that method.

  • ContainsKey: 30.654 seconds (baseline)
  • AtomicLong: 29.780 seconds (1.03 times as fast)
  • TestForNull: 28.804 seconds (1.06 times as fast)
  • Trove: 26.313 seconds (1.16 times as fast)
  • MutableInt: 25.747 seconds (1.19 times as fast)

Conclusions

It would appear that only the MutableInt method and the Trove method are significantly faster, in that only they give a performance boost of more than 10%. However, if threading is an issue, AtomicLong might be more attractive than the others (I'm not really sure). I also ran TestForNull with final variables, but the difference was negligible.

Note that I haven't profiled memory usage in the different scenarios. I'd be happy to hear from anybody who has good insights into how the MutableInt and Trove methods would be likely to affect memory usage.

Personally, I find the MutableInt method the most attractive, since it doesn't require loading any third-party classes. So unless I discover problems with it, that's the way I'm most likely to go.

The code

Here is the crucial code from each method.

ContainsKey

import java.util.HashMap; import java.util.Map; ... Map<String, Integer> freq = new HashMap<String, Integer>(); ... int count = freq.containsKey(word) ? freq.get(word) : 0; freq.put(word, count + 1); 

TestForNull

import java.util.HashMap; import java.util.Map; ... Map<String, Integer> freq = new HashMap<String, Integer>(); ... Integer count = freq.get(word); if (count == null) {     freq.put(word, 1); } else {     freq.put(word, count + 1); } 

AtomicLong

import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.ConcurrentMap; import java.util.concurrent.atomic.AtomicLong; ... final ConcurrentMap<String, AtomicLong> map =      new ConcurrentHashMap<String, AtomicLong>(); ... map.putIfAbsent(word, new AtomicLong(0)); map.get(word).incrementAndGet(); 

Trove

import gnu.trove.TObjectIntHashMap; ... TObjectIntHashMap<String> freq = new TObjectIntHashMap<String>(); ... freq.adjustOrPutValue(word, 1, 1); 

MutableInt

import java.util.HashMap; import java.util.Map; ... class MutableInt {   int value = 1; // note that we start at 1 since we're counting   public void increment () { ++value;      }   public int  get ()       { return value; } } ... Map<String, MutableInt> freq = new HashMap<String, MutableInt>(); ... MutableInt count = freq.get(word); if (count == null) {     freq.put(word, new MutableInt()); } else {     count.increment(); } 
like image 152
gregory Avatar answered Sep 24 '22 02:09

gregory