Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handle large data structure in Java

I'm working on a Java application that needs working on very large matrices. For example multiplying two 10 million * 10 million matrices! Of course the Java heap does not have enough space even for storing one of these matrices. What should I do? Should I use databases to store my matrices and bring to memory every needed part and multiply it part after another?

like image 262
user78564 Avatar asked Mar 16 '09 12:03

user78564


1 Answers

First off, a 10 million x 10 million matrix is simply enormous. Assuming doubles for each cell and no storage overhaed, each one of these things is going to be 800 terabytes. Just reading each cell once over from main memory (should it somehow magically fit there, which clearly isn't happening), would take days. Doing it from any sort of plausible SAN (we'll put it on 10GbE) is more likely to be months. And no matrix multiply has O(n) complexity - the normal approaches are O(n^3). So... you aren't doing this with memory mapped files, common databases, or anything of that sort.

Code doing something like this is going to live or die on cache efficiency, where "cache" includes making good use of main memory, local disk drives. Since any storage interface holding more than one 800 terabyte matrix is bound to be a SAN of some sort, you almost certainly involve multiple servers reading and working on different parts of it, too.

There are lots of well-known ways to parallelise matrix multiplication (essentially multiply various-sized sub-matrices and then combining the results), and shift layout so that the access patterns have reasonable cache locality by organizing the data around space-filling curves instead of row/column arrangements. You're certainly going to want to look at the classic LAPACK interfaces and design, Intel's MKL, GotoBLAS as implementations of the BLAS functions tuned to specific modern hardware, and after that you're probably venturing into unexplored territory :-)

like image 148
puetzk Avatar answered Nov 08 '22 09:11

puetzk