Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Immutable Map implementation for huge maps

If I have an immutable Map which I might expect (over a very short period of time - like a few seconds) to be adding/removing hundreds of thousands of items from, is the standard HashMap a bad idea? Let's say I want to pass 1Gb of data through the Map in <10 seconds in such a way that the maximum size of the Map at any once instant is only 256Mb.

I get the impression that the map keeps some kind of "history" but I will always be accessing the last-updated table (i.e. I do not pass the map around) because it is a private member variable of an Actor which is updated/accessed only from within reactions.

Basically I suspect that this data structure may be (partly) at fault for issues I am seeing around JVMs going out of memory when reading in large amounts of data in a short time.

Would I be better off with a different map implementation and, if so, what is it?

like image 782
oxbow_lakes Avatar asked Feb 19 '10 16:02

oxbow_lakes


1 Answers

Ouch. Why do you have to use an immutable map? Poor garbage collector! Immutable maps generally require (log n) new objects per operation in addition to (log n) time, or they really just wrap mutable hash maps and layer changesets on top (which slows things down and can increase the number of object creations).

Immutability is great, but this does not seem to me like the time to use it. If I were you, I'd stick with scala.collection.mutable.HashMap. If you need concurrent access, wrap the Java util.concurrent one instead.

You also might want to increase the size of the young generation in the JVM: -Xmn1G or more (assuming you're running with -Xmx3G). Also, use the throughput (parallel) garbage collector.

like image 50
Rex Kerr Avatar answered Nov 16 '22 01:11

Rex Kerr