Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multimap Space Issue: Guava

In my Java code, I am using Guava's Multimap (com.google.common.collect.Multimap) by using this:

 Multimap<Integer, Integer> Index = HashMultimap.create()

Here, Multimap key is some portion of a URL and value is another portion of the URL (converted into an integer). Now, I assign my JVM 2560 Mb (2.5 GB) heap space (by using Xmx and Xms). However, it can only store 9 millions of such (key,value) pairs of integers (approx 10 million). But, theoretically (according to memory occupied by int) it should store more.

Can anybody help me,

  1. Why is Multimap using lots of memory? I checked my code and without inserting pairs into the Multimap, it only uses 1/2 MB of memory.
  2. 2.

Is there another way or home-baked solution to solve this memory issue? Means, Is there any way to reduce those object overheads as I want to store only int-int? In any other language ? Or any other solution (home-baked preferred) to solve issue I faced, means DB based or something like that solution.

like image 911
Arpssss Avatar asked Mar 29 '12 17:03

Arpssss


People also ask

Does MultiMap maintain order?

One important thing to note about multimap is that multimap keeps all the keys in sorted order always.

What is difference between map and MultiMap in Java?

The map and the multimap are both containers that manage key/value pairs as single components. The essential difference between the two is that in a map the keys must be unique, while a multimap permits duplicate keys.

What is guava MultiMap?

A Multimap is a new collection type that is found in Google's Guava library for Java. A Multimap can store more than one value against a key. Both the keys and the values are stored in a collection, and considered to be alternates for Map<K, List<V>> or Map<K, Set<V>> (standard JDK Collections Framework).

What is MultiHashMap in Java?

MultiHashMap is the default implementation of the MultiMap interface. A MultiMap is a Map with slightly different semantics. Putting a value into the map will add the value to a Collection at that key. Getting a value will return a Collection, holding all the values put to that key.


2 Answers

There's a huge amount of overhead associated with Multimap. At a minimum:

  • Each key and value is an Integer object, which (at a minimum) doubles the storage requirements of each int value.
  • Each unique key value in the HashMultimap is associated with a Collection of values (according to the source, the Collection is a Hashset).
  • Each Hashset is created with default space for 8 values.

So each key/value pair requires (at a minimum) perhaps an order of magnitude more space than you might expect for two int values. (Somewhat less when multiple values are stored under a single key.) I would expect 10 million key/value pairs to take perhaps 400MB.

Although you have 2.5GB of heap space, I wouldn't be all that surprised if that's not enough. The above estimate is, I think, on the low side. Plus, it only accounts for how much is needed to store the map once it is built. As the map grows, the table needs to be reallocated and rehashed, which temporarily at least doubles the amount of space used. Finally, all this assumes that int values and object references require 4 bytes. If the JVM is using 64-bit addressing, the byte count probably doubles.

like image 64
Ted Hopp Avatar answered Nov 10 '22 14:11

Ted Hopp


Probably the simplest way to minimize the memory overhead would be to potentially mix Trove's primitive collection implementations (to avoid memory overhead of boxing) and Guava's Multimap, something like

SetMultimap<Integer, Integer> multimap = Multimaps.newSetMultimap(
  TDecorators.wrap(TIntObjectHashMap<Collection<Integer>>()),
  new Supplier<Set<Integer>>() {
    public Set<Integer> get() {
      return TDecorators.wrap(new TIntHashSet());
    }
  });

That still has the overhead of boxing and unboxing on queries, but the memory it consumes just sitting there would be significantly reduced.

like image 24
Louis Wasserman Avatar answered Nov 10 '22 14:11

Louis Wasserman