Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maximizing Java Heap Space

I'm trying to use very large square matrices in Java, on the order of n = 1e6 or more. The matrices aren't sparse, so I don't see much way around representing them as a 2D array, which requires n^2 * sizeof(int) bits of memory. Obviously, I'm getting heap overflow errors, even when adding compiler flags to use as large a heap as my machine will allow.

I'm willing to assume I have the perfect computer (unlimited RAM, etc.) for the sake of the question, though in reality I'm on a 64 bit machine with 16 gigs of RAM. It seems that my machine is only so relevant, since I'm limited by the JVM not my actual hardware (in that the JVM can't have more memory than my physical machine).

I understand (and is cited, e.g., here Making a very large Java array) that a Java array can't be, even theoretically, larger than MAX_INT as that's used for indexing.

My question is: are there any ways to coax extra memory out of the JVM heap

I understand that, if there are, they probably won't get me a magnitude more information.

For Example

In C, I can declare static constant variables and have them moved to the data section of the code which will have much more space than the heap and much much more than the stack (Where are static variables stored (in C/C++)? ).

In Java, it appears that even if I copy the variable into a "data" section, the value is going onto the main heap static allocation in java - heap, stack and permanent generation which means I've succeeded in moving one whole byte off of the heap (yay!)

My Solution

My "solution" isn't really a solution. I made a simple data structre that usese the RandomFileAccess io procedures to replace array accesses with read and write to an external file. It's still constant time access, but we went from one of Java's fastest operations to a very very slow procedure (though we can pull in "cache" lines from the file all at once, which makes the process tremendously speedier). Better ideas?

Not My Question

I'm not asking how to make an array above java's maximum array size. It's not possible. These are nested arrays - a single n sized array is fine, n of them causes problems.

I'm not asking this How to deal with "java.lang.OutOfMemoryError: Java heap space" error (64MB heap size) . Garbage collection isn't relevant - I can't even make the array let alone worry about when it gets deleted.

I also can't use an iterator (I think), which would otherwise be a possibility; a function like matrix multiplication needs to be able to directly index

Note: Java isn't the right language in which to do operations on very large matrices. I'd be better off using an abacus. But here I am and that's outside my control.

like image 893
en_Knight Avatar asked Nov 05 '14 06:11

en_Knight


1 Answers

There are some missing aspects to your original question; for instance, I cannot believe that you have to use such large matrices and just "forget them" between runs. Well, maybe you do, I don't know.

Anyway: your usage of RandomAccessFile is, imho, nearly there; only that if I were you, I'd use FileChannel.map(). On Unix systems, it's basically a way to call mmap(2). In the scenario below, I assume that you have a FileChannel to your matrix (I take it you understand what I mean).

Since you use matrices, an since it looks like the values at any given "coordinates" in the matrix all have the same length, it means you can easily compute the offset into the file to read and/or write a given value into the matrix. Of course, you won't want to map that value, but a window containing that value; make the window large enough to be useful, and do NOT worry about heap space consumption: FileChannel.map() does not consume heap space (save for object bookkeeping). On 64bit JVMs, you need not worry; had you been using a 32bit JVM, you'd have had to account for address space exhaustion.

There is, of course, the problem of expiry: how long do you need this or that mapping to remain active. This is entirely dependent on your program and what you do with it. But using a FileChannel and mapping the relevant zones is the way to go. However, you should be reminded that it is unsafe to map more than 2^31 - 1 bytes; settle for 2^30 (1 GiB) byte windows, for instance; and remind that you can convert ByteBuffers into IntBuffers.


Edit: some relevant links:

  • FileChannel.open();
  • FileChannel.map();
  • ByteBuffer, and its asIntBuffer() method;
  • IntBuffer.
like image 119
fge Avatar answered Oct 21 '22 11:10

fge