Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory efficient multivaluemap

Hi I have the following problem: I'm storing strings and a corresponding list of integer values in an MultiValueMap<String, Integer> I'm storing about 13 000 000 million strings and one string can have up to 500 or more values. For every single value i will have random access on the Map. So worst case are 13 000 000* 500 put calls. Now the speed of the map is good but the memory overhead gets quite high. A MultiValueMap<String, Integer> is nothing else then a HashMap/TreeMap<String, <ArrayList<Integer>>. Both HashMap and TreeMap have quite a lot of memory Overhead. I wont be modifying the map once it is done, but I need it to be fast and as small as possible for random access in a program. (I'm storing it on disk and loading it on start, the serialized map file takes up about 600mb but in memory its about 3gb?)

the most memory efficient thing would be, to store the String in sorted string array and have a corresponding two dimensional int array for values. So access would be a binary search on the string array and getting the corresponding values.

Now I have three ways to get there:

  1. I use a sorted MultivalueMap (TreeMap) for the creation phase to store everything.After I'm finished with getting all values, I get the string array by calling map.keyset().toArray(new String[0]); Make a two dimensional int array and get all the values from the multivaluemap. Pro: It's easy to implement, It is still fast during creation. Con: It takes up even more memory during the copying from Map to Arrays.

  2. I use Arrays or maybe ArrayLists from the start and store everything in there Pro: least memory overhead. Con: this would be enormously slow because i would have to sort/copy the Array every time a add a new Key, Also i would need to implement my own (propably even slower) sorting to keep the corresponding int array in the same order like the strings. Hard to implement

  3. I use Arrays and a MultivalueMap as buffer. After the program finished 10% or 20% of the creation phase, I will add the values to the Arrays and keep them in order, then start a new Map. Pro: Propably still fast enough and memory efficient enough. Con: Hard to implement.

None of these solutions really feel right to me. Do you know any other solutions to this problem, maybe a memory efficient (MultiValue)Map implementation?

I know I could be using a database so don't bother posting it as an answer. I want to know how i could do this without using a database.

like image 426
samy Avatar asked Feb 16 '12 21:02

samy


People also ask

What is MultiValueMap?

A MultiValueMap decorates another map, allowing it to have more than one value for a key. A MultiMap is a Map with slightly different semantics. Putting a value into the map will add the value to a Collection at that key. Getting a value will return a Collection, holding all the values put to that key.

How much memory does a HashMap use?

A HashMap. Entry is 24 Bytes, not 16, for example. For many cases, this adds up to an enormous amount of memory wasted. For example, a HashMap<Integer, Double> needs about 100 Bytes per stored value due to boxing, with 12 bytes of actual data, and 88 bytes overhead.

How do you use a Multivalue map?

Method SummaryAdd the given single value to the current list of values for the given key. Add all the values of the given list to the current list of values for the given key. Add all the values of the given MultiValueMap to the current values. Add the given value, only when the map does not contain the given key.


1 Answers

If you switched to Guava's Multimap -- I have no idea if that's possible for your application -- you might be able to use Trove and get

ListMultimap<String, Integer> multimap = Multimaps.newListMultimap(
  new HashMap<String, Collection<Integer>>(),
  new Supplier<List<Integer>>() {
    public List<Integer> get() {
      return new TIntListDecorator();
    }
  });

which will make a ListMultimap that uses a HashMap to map to List values backed by int[] arrays, which should be memory-efficient, though you'll pay a small speed penalty because of boxing. You might be able to do something similar for MultiValueMap, though I have no idea what library that's from.

like image 85
Louis Wasserman Avatar answered Nov 02 '22 23:11

Louis Wasserman