Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: Advice on handling large data volumes. (Part Deux)

Alright. So I have a very large amount of binary data (let's say, 10GB) distributed over a bunch of files (let's say, 5000) of varying lengths.

I am writing a Java application to process this data, and I wish to institute a good design for the data access. Typically what will happen is such:

  • One way or another, all the data will be read during the course of processing.
  • Each file is (typically) read sequentially, requiring only a few kilobytes at a time. However, it is often necessary to have, say, the first few kilobytes of each file simultaneously, or the middle few kilobytes of each file simultaneously, etc.
  • There are times when the application will want random access to a byte or two here and there.

Currently I am using the RandomAccessFile class to read into byte buffers (and ByteBuffers). My ultimate goal is to encapsulate the data access into some class such that it is fast and I never have to worry about it again. The basic functionality is that I will be asking it to read frames of data from specified files, and I wish to minimize the I/O operations given the considerations above.

Examples for typical access:

  • Give me the first 10 kilobytes of all my files!
  • Give me byte 0 through 999 of file F, then give me byte 1 through 1000, then give me 2 through 1001, etc, etc, ...
  • Give me a megabyte of data from file F starting at such and such byte!

Any suggestions for a good design?

like image 367
Jake Avatar asked Dec 09 '22 23:12

Jake


1 Answers

Use Java NIO and MappedByteBuffers, and treat your files as a list of byte arrays. Then, let the OS worry about the details of caching, read, flushing etc.

like image 133
Will Hartung Avatar answered Dec 21 '22 23:12

Will Hartung