Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement Concurrent read to a file mapped to memory in Java?

I have many threads that concurrently read the same file(entirely about 100M), and only one thread to update the file. I want to map the file in memory to reduce FILE I/O. How can this be done in Java?

I basically have considered the following 2 methods:

  1. with byte array to store the file, and each time create ByteArrayInputStream to read the buffer when multi-thread read.
  2. with NIO to get one file channel, synchronized the channel to read from the MappedByteBuffer for multi-thread read.

I'm not sure whether the methods should work. Please help to give some hint if there is a better solution.

like image 205
Simon Wang Avatar asked May 14 '12 08:05

Simon Wang


1 Answers

Use NIO with each thread creating its own mapping and reading the data in its own private buffer. Keep the private buffer size optimal. The OS reads the file in its file cache in pages and the pages are read into the private buffers. If same regions are read by multiple threads then the data would be read from the same pages in the file cache saving some file i/o cycles. Below is a small diagram to indicate this. Hope it helps to understand better.

memory mapped file io

With reference to the diagram above, below is some explanation. A region of the file is mapped to memory. Creating a mapping is just a logical marking to say that you want to read from a particular portion of a file. Once the mapping is created the mapped region is ready to be read. When you start reading, the OS fetches the file data into its pages in the file cache. The region could be mapped to one or more pages. Now, you read the pages into your own private buffer (multiple pages at a time to optimize). Some other thread could be reading the same region as the first one, so it also reads the same pages into its private buffer. Note that this time the read happens from the file cache without page faults. After you have processed your private buffer, you request to read further. Note that you are reading a portion of your mapping into your private buffer at a time. Your file could be 100MB and you map a 10MB portion to memory; and you coud have 40KB private buffer and you read 40KB out of 10MB first. Then request for next 40KB and so on. The OS checks if the data you want to read is already fetched into the cache. If not, a page fault occurs and the OS fetches the data requested in to the pages. Again this data can be shared if multiple thread request to read the same region. You can very well use the file cache itself for reading instead of creating your own private buffer. But, this can lead to multiple page faults if the file is concurrently read multiple times across multiple regions. So it this case it better to have a private buffer of optimal size.

like image 130
Drona Avatar answered Sep 20 '22 08:09

Drona