Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

300 million items in a Map

  1. If each of them is guaranteed to have a unique key (generated and enforced by an external keying system) which Map implementation is the correct fit for me? Assume this has to be optimized for concurrent lookup only (The data is initialized once during the application startup).
  2. Does this 300 million unique keys have any positive or negative implications on bucketing/collisions?
  3. Any other suggestions?

My map would look something like this

Map<String, <boolean, boolean, boolean, boolean>>
like image 938
Aravind Yarram Avatar asked Jan 26 '13 16:01

Aravind Yarram


2 Answers

I would not use a map, this needs to much memory. Especially in your case. Store the values in one data array, and store the keys in a sorted index array. In the sorted array you use binSearch to find the position of a key in data[].

The tricky part will be building up the array, without running out of memory.

you dont need to consider concurreny because you only read from the data

Further try to avoid to use a String as key. try to convert them to long.
the advantage of this solution: search time garuanteed to not exceed log n. even in worst cases when keys make problems with hashcode

like image 73
AlexWien Avatar answered Nov 09 '22 15:11

AlexWien


Other suggestion? You bet.

Use a proper key-value store, Redis is the first option that comes to mind. Sure it's a separate process and dependency, but you'll win big time when it comes to proper system design.

There should be a very good reason why you would want to couple your business logic with several gigs of data in same process memory, even if it's ephemeral. I've tried this several times, and was always proved wrong.

like image 23
Yuval Adam Avatar answered Nov 09 '22 14:11

Yuval Adam