How should I deal with a very large array in Java?

Tags:

I have an algorithm which currently allocates a very large array of doubles, which it updates and searches frequently. The size of the array is N^2/2, where N is the number of rows on which the algorithm is operating. I also have to keep a copy of the entire thing for purposes associated with the application surrounding the algorithm.

Of course this imposes a limit on the number of rows that my algorithm can handle as I have the heap limitation to contend with. Up to this point I have got away with asking the people using the algorithm to update the -Xmx setting to allocate more space, and that has worked fine. However, I now have a genuine problem where I need this array to be larger than I can fit into memory.

I already have plans to change my algorithm to mitigate the necessity of this large array and have some promising results in that domain. However it is a fundamental alteration to the process and will require a lot more work before it gets to the highly polished condition of my current code which is operating in production very successfully and has been for several years.

So, while I am perfecting my new algorithm I wanted to extend the life of the existing one and that means tackling the heap limitation associated with allocating my huge array of doubles.

My question is what is the best way of dealing with it? Should I use an nio FileChannel and a MappedByteBuffer, or is there a better approach. If I do use the nio approach, what sort of performance hit should I expect to take compared to an in-memory array of the same size?

Thanks

449

asked Dec 16 '09 22:12

Simon

2 Answers

If you are starting to run out of available memory, then you will probably also soon start to run out of available array indexes, an array is bounded in size to Integer.MAX_VALUE, and that when using doubles as the array elements is "only" 32GB in size.

Getting a machine with 32GB of memory is expensive, but probably not as expensive as your time to modify the algorithm, and all of the associated testing.

However, if the client is running to the edges of memory, and their datasets are still growing, then it makes sense for you to bite the bullet now, and make the changes to be able to use less memory at any given time, since they will likely soon outgrow an array anyway.

The other option that you have, assuming that the array is somewhat sparsely filled, is to use one of the various sparse array data structures, although these tend to only be beneficial if your array is less than 20% full.

Edit: Since it seems that you have already investigated the alternatives, then the MappedByteBuffer may well be the way to go. Obviously this is going to have a performance impact, however if you do mostly sequential reads and writes from the array, then this should not be too bad. If you are doing random reads and writes, then this is going to get very slow very fast. Or very slow very slowly... depending on how you look at these things ;-)

135

answered Sep 20 '22 02:09

Paul Wagland

If you're running on PCs, page sizes for mapped files are likely to be 4 kilobytes.

So the question really starts from if I start swapping the data out to disk, "how random is my random access to the RAM-that-is-now-a-file"?

And (...can I and if so...) how can I order the doubles to maximise cases where doubles within a 4K page are accessed together rather than a few at a time in each page before the next 4K disk fetch?

If you use standard IO, you probably still want to read and write in chunks but ther chunks could be smaller. Sectors will be at least 512 bytes, disk clusters bigger, but what size of read is best given that there is a kernel round trip overhead for each IO?

I'm sorry but I'm afraid your best next steps depend to a great extent on the algorithm and the data you are using.

answered Sep 20 '22 02:09

martinr

Related questions
                            
                                Prevent Spring Boot application closing until all current requests are finished
                            
                                BuildConfig is public, should be declared in a file named BuildConfig.java
                            
                                Why do ZonedDateTime and Calendar disagree on the hour in year 2050? [duplicate]
                            
                                Observing SharedPreferences Data
                            
                                How to check if thread holds the monitor in IntelliJ?
                            
                                How to use ViewModel in a fragment?
                            
                                Java Generic Wildcard Constructor not accepting objects?
                            
                                Neural Network returning NaN as output
                            
                                How to create a standalone .exe in Java (that runs without an installer and a JRE)
                            
                                Accessing a bean with a dot(.) in its ID
                            
                                How to detect synchronization violations with Java
                            
                                How do you use TLS/SSL Http Authentication with a CXF client to a web service?
                            
                                What are usable databinding libraries for Swing?
                            
                                Is there a free package for reading, manipulating and writing ASN.1 in Java?
                            
                                JDBC Batch Update Problem
                            
                                What is the proper JPA mapping for @Id in parent and unique sequence in base classes
                            
                                Java (J2SE) and Bluetooth
                            
                                Read text from image file in Java [duplicate]
                            
                                How to get notification about EJB deployment (to set up a timer)?
                            
                                In a spring messages.properties, how to ensure line break of error message when using an error code as key?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With