Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java NIO MappedByteBuffer OutOfMemoryException

I am really in trouble: I want to read HUGE files over several GB using FileChannels and MappedByteBuffers - all the documentation I found implies it's rather simple to map a file using the FileChannel.map() method. Of course there is a limit at 2GB as all the Buffer methods use int for position, limit and capacity - but what about the system implied limits below that?

In reality, I get lots of problems regarding OutOfMemoryExceptions! And no documentation at all that really defines the limits! So - how can I map a file that fits into the int-limit safely into one or several MappedByteBuffers without just getting exceptions?

Can I ask the system which portion of a file I can safely map before I try FileChannel.map()? How? Why is there so little documentation about this feature??

like image 679
Zordid Avatar asked Sep 21 '12 13:09

Zordid


2 Answers

I can offer some working code. Whether this solves your problem or not is difficult to say. This hunts through a file for a pattern recognised by the Hunter.

See the excellent article Java tip: How to read files quickly for the original research (not mine).

// 4k buffer size.
static final int SIZE = 4 * 1024;
static byte[] buffer = new byte[SIZE];

// Fastest because a FileInputStream has an associated channel.
private static void ScanDataFile(Hunter p, FileInputStream f) throws FileNotFoundException, IOException {
  // Use a mapped and buffered stream for best speed.
  // See: http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly
  FileChannel ch = f.getChannel();
  long red = 0L;
  do {
    long read = Math.min(Integer.MAX_VALUE, ch.size() - red);
    MappedByteBuffer mb = ch.map(FileChannel.MapMode.READ_ONLY, red, read);
    int nGet;
    while (mb.hasRemaining() && p.ok()) {
      nGet = Math.min(mb.remaining(), SIZE);
      mb.get(buffer, 0, nGet);
      for (int i = 0; i < nGet && p.ok(); i++) {
        p.check(buffer[i]);
      }
    }
    red += read;
  } while (red < ch.size() && p.ok());
  // Finish off.
  p.close();
  ch.close();
  f.close();
}
like image 101
OldCurmudgeon Avatar answered Oct 21 '22 09:10

OldCurmudgeon


What I use is a List<ByteBuffer> where each ByteBuffer maps to the file in block of 16 MB to 1 GB. I uses powers of 2 to simplify the logic. I have used this to map in files up to 8 TB.

A key limitation of memory mapped files is that you are limited by your virtual memory. If you have a 32-bit JVM you won't be able to map in very much.

I wouldn't keep creating new memory mappings for a file because these are never cleaned up. You can create lots of these but there appears to be a limit of about 32K of them on some systems (no matter how small they are)

The main reason I find MemoryMappedFiles useful is that they don't need to be flushed (if you can assume the OS won't die) This allows you to write data in a low latency way, without worrying about losing too much data if the application dies or too much performance by having to write() or flush().

like image 30
Peter Lawrey Avatar answered Oct 21 '22 09:10

Peter Lawrey