Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: what is the official way to store large data (>2GB) in memory?

It is well known limitation that the common way to store data in byte[] arrays is limited to 2^31 bytes (2GB).

There are plenty of Java bug reports and Java specification requests that address this issue. Some of them has been files at the beginning of this century!

However all related entries I found were closed and/or marked as duplicate. In the mean time every consumer PC has enough memory so that this issue gets more and more important.

Therefore I am asking myself:

What is th (Java) official way to handle large in-memory data? E.g. storing 4GB in RAM

If there is no official solution what is the common solution used by the community?

Note: I consider saving the data to a temporary files not as a solution. Servers with more than 100GB RAM are not uncommon...

like image 568
Robert Avatar asked Oct 12 '25 19:10

Robert


2 Answers

There is no such thing as an "official" way. I have never met anything about this problem in the official Java language specification.

But generally saying, you can always represent such a big array as array of arrays, i.e. byte[][]. In this case each element of the top-level array will describe a "page" of your storage. This will allow you to store theoretically 2^31x2^31=2^62 bytes.

like image 83
Andremoniy Avatar answered Oct 14 '25 09:10

Andremoniy


Java, as general-purpose language, does not have neither specific instruments for handling large in-memory data out of the box, nor any special official recommendations for it so far.

You have the following options while using Java to work with as much memory as possible under single JVM:

  • As already mentioned in this thread - rely on in-heap arrays of arrays, even better on arrays of java.nio.Buffer wrappers for those arrays
  • Maintain bunch of off-heap direct java.nio.ByteBuffer's taking care about capacity of each one due to constraints of the 32 bit indexing. Example. Also memory-mapped files have to be mentioned here
  • Use in-process in-memory database like H2 keeping in mind its own limitations (H2 also even can rely on own in-memory file system)
  • Use off-process memory storage like Memcached with corresponding Java client
  • Set up RAM disk (or use tmpfs, or something like that) and work with memory as with a file system from Java
  • ...

Each and any approach has own drawbacks and advantages in terms of read/write speed, footprint, durability, maintainability, etc. At the same time it depends on the nature of objects being stored in memory, their lifecycle, access scheme, etc.

So desired choice must be elaborated by strictly matching it against particular requirements/use cases.

like image 38
Kostiantyn Avatar answered Oct 14 '25 08:10

Kostiantyn



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!