Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java NIO - Memory mapped files

I recently came across this article which provided a nice intro to memory mapped files and how it can be shared between two processes. Here is the code for a process that reads in the file:

import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;

public class MemoryMapReader {

 /**
  * @param args
  * @throws IOException 
  * @throws FileNotFoundException 
  * @throws InterruptedException 
  */
 public static void main(String[] args) throws FileNotFoundException, IOException, InterruptedException {

  FileChannel fc = new RandomAccessFile(new File("c:/tmp/mapped.txt"), "rw").getChannel();

  long bufferSize=8*1000;
  MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, bufferSize);
  long oldSize=fc.size();

  long currentPos = 0;
  long xx=currentPos;

  long startTime = System.currentTimeMillis();
  long lastValue=-1;
  for(;;)
  {

   while(mem.hasRemaining())
   {
    lastValue=mem.getLong();
    currentPos +=8;
   }
   if(currentPos < oldSize)
   {

    xx = xx + mem.position();
    mem = fc.map(FileChannel.MapMode.READ_ONLY,xx, bufferSize);
    continue;   
   }
   else
   {
     long end = System.currentTimeMillis();
     long tot = end-startTime;
     System.out.println(String.format("Last Value Read %s , Time(ms) %s ",lastValue, tot));
     System.out.println("Waiting for message");
     while(true)
     {
      long newSize=fc.size();
      if(newSize>oldSize)
      {
       oldSize = newSize;
       xx = xx + mem.position();
       mem = fc.map(FileChannel.MapMode.READ_ONLY,xx , oldSize-xx);
       System.out.println("Got some data");
       break;
      }
     }   
   }

  }

 }

}

I have, however, a few comments/questions regarding that approach:

If we execute the reader only on an empty file, i.e run

  long bufferSize=8*1000;
  MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, bufferSize);
  long oldSize=fc.size();

This will allocate 8000 bytes which will now extend the file. The buffer that this returns has a limit of 8000 and a position of 0, therefore, the reader can proceed and read empty data. After this happens, the reader will stop, as currentPos == oldSize.

Supposedly now the writer comes in (code is omitted as most of it is straightforward and can be referenced from the website) - it uses the same buffer size, so it will write first 8000 bytes, then allocate another 8000, extending the file. Now, if we suppose this process pauses at this point, and we go back to the reader, then the reader sees the new size of the file and allocates the remainder (so from position 8000 until 1600) and starts reading again, reading in another garbage...

I am a bit confused whether there is a why to synchronize those two operations. As far as I see it, any call to map might extend the file with really an empty buffer (filled with zeros) or the writer might have just extended the file, but has not written anything into it yet...

like image 700
Bober02 Avatar asked Mar 03 '14 17:03

Bober02


1 Answers

I do a lot of work with memory-mapped files for interprocess communication. I would not recommend Holger's #1 or #2, but his #3 is what I do. But a key point is perhaps that I only ever work with a single writer - things get more complicated if you have multiple writers.

The start of the file is a header section with whatever header variables you need, most importantly a pointer to the end of the written data. The writer should always update this header variable after writing a piece of data, and the reader should never read beyond this variable. A thing called "cache coherency" that all mainstream CPU's have will guarantee that the reader will see memory writes in the same sequence they are written, so the reader will never read uninitialised memory if you follow these rules. (An exception is where the reader and writers are on different servers - cache coherency doesn't work there. Don't try to implement shared memory across different servers!)

There is no limit to how frequently you can update the end-of-file pointer - it's all in memory and there won't be any i/o involved, so you can update it each record or each message you write.

ByteBuffer has versions of 'getInt()' and 'putInt()' methods which take an absolute byte offset, so that's what I use for reading & writing the end-of-file marker...I never use the relative versions when working with memory-mapped files.

There's no way you should use the file size or yet another interprocess method to communicate the end-of-file marker and no need or benefit when you already have shared memory.

like image 76
Tim Cooper Avatar answered Nov 04 '22 07:11

Tim Cooper