Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HashMap alternatives for memory-efficient data storage

I've currently got a spreadsheet type program that keeps its data in an ArrayList of HashMaps. You'll no doubt be shocked when I tell you that this hasn't proven ideal. The overhead seems to use 5x more memory than the data itself.

This question asks about efficient collections libraries, and the answer was use Google Collections. My follow up is "which part?". I've been reading through the documentation but don't feel like it gives a very good sense of which classes are a good fit for this. (I'm also open to other libraries or suggestions).

So I'm looking for something that will let me store dense spreadsheet-type data with minimal memory overhead.

  • My columns are currently referenced by Field objects, rows by their indexes, and values are Objects, almost always Strings
  • Some columns will have a lot of repeated values
  • primary operations are to update or remove records based on values of certain fields, and also adding/removing/combining columns

I'm aware of options like H2 and Derby but in this case I'm not looking to use an embedded database.

EDIT: If you're suggesting libraries, I'd also appreciate it if you could point me to a particular class or two in them that would apply here. Whereas Sun's documentation usually includes information about which operations are O(1), which are O(N), etc, I'm not seeing much of that in third-party libraries, nor really any description of which classes are best suited for what.

like image 501
Brad Mace Avatar asked Oct 19 '10 19:10

Brad Mace


People also ask

What can I use instead of HashMap?

The LinkedHashMap is quite similar to HashMap, with an additional feature of maintaining the order of the inserted element. HashMap provides an easy way to insert, delete, and search the elements, but it does not provide any way to maintain and track the order of the inserted elements.

Is HashMap memory efficient?

WeakHashMap as an Efficient Memory CacheUsing a simple HashMap will not be a good choice because the value objects may occupy a lot of memory. What's more, they'll never be reclaimed from the cache by a GC process, even when they are not in use in our application anymore.

Is TreeMap or HashMap faster?

Conclusions. HashMap is a general purpose Map implementation. It provides a performance of O(1) , while TreeMap provides a performance of O(log(n)) to add, search, and remove items. Hence, HashMap is usually faster.

Does HashMap use more memory?

To use them inside another data structure, you have to pay the price of a pointer to the object. In Java, the best way to save memory is to use a library, like fastutil, that works directly with native types. Conclusion: Whether you use a TreeMap or HashMap seems to have very little effect on your memory usage.


1 Answers

Some columns will have a lot of repeated values

immediately suggests to me the possible use of the FlyWeight pattern, regardless of the solution you choose for your collections.

like image 84
Brian Agnew Avatar answered Sep 19 '22 06:09

Brian Agnew