Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parallel-process data in memory mapped file

As name of memory mapped file indicates, I understand that a part of a large file can be mapped to memory using class MemoryMappedFile in C# for fast data process. What I would like to do with the memory mapped file is to parallel-process the memory mapped. In order to do that, I have following questions

  1. Is MemoryMappedFileViewAccessor thread-safe and Parallel.For-safe? I actually made a demo program to test the question and it seems to be working. But can't find any reference about this. If the answer is yes, I am done. Otherwise,
  2. Is there any way to directly access the memory mapped with array? I know MemoryMappedFileViewAccessor has ReadArray method but using the method is duplication of the memory.
like image 688
Tae-Sung Shin Avatar asked May 03 '13 04:05

Tae-Sung Shin


People also ask

How do memory-mapped files simplify application programming?

In some cases, memory-mapped files simplify the logic of a program by using memory-mapped I/O. Rather than using fseek() multiple times to jump to random file locations, the data can be accessed directly by using an index into an array. Memory-mapped files provide more efficient access for initial reads.

How does memory mapped file work?

A memory-mapped file contains the contents of a file in virtual memory. This mapping between a file and memory space enables an application, including multiple processes, to modify the file by reading and writing directly to the memory.

What are the advantages of memory-mapped files?

Benefits. The benefit of memory mapping a file is increasing I/O performance, especially when used on large files. For small files, memory-mapped files can result in a waste of slack space as memory maps are always aligned to the page size, which is mostly 4 KiB.

Why memory-mapped I O is faster?

Memory-mapped I/O provides several potential advantages over explicit read/write I/O, especially for low latency devices: (1) It does not require a system call, (2) it incurs almost zero overhead for data in memory (I/O cache hits), and (3) it removes copies between kernel and user space.


1 Answers

You can reason this out. A memory mapped file is just a chunk of memory in your program whose bytes are accessible by more than one process. They are pretty awkward in managed code since this chunk exists at a specific address. Which requires accessing the data using a pointer, they are taboo in managed code. The MemoryMappedFileViewAccessor wraps that pointer, it copies data from managed memory to the shared memory. Do note that this defeats the major reason for using MMFs, and why their support took so long to show up in .NET. Be sure that you don't want to use named pipes instead.

So reasoning this out, a MMF certainly isn't thread-safe by design since this is shared memory, just like global variables are in your code. Things go wrong the exact same way if threads read and write the same section of the shared memory. And you have to protect against that the exact same as well, a lock to ensure only one thread can access a shared section.

Also note that you need to implement that locking between the processes that read and write the MMF. Which tends to be painful, you have to use a named mutex that the "master" process creates and the "slave" process opens. You cannot skimp on that locking requirement. Notable is that you never mentioned taking care of this in your question, so Red Flag there.

Within one process, threads that don't access the same section of the MMF cannot get in each others way. Just like two threads that access different variables don't require any synchronization. As long as they hold the mutex that ensures that another process cannot write to the section. Note that this probably means you want to use a Semaphore to protect the MMF access, a Mutex can only be acquired by one thread.

like image 64
Hans Passant Avatar answered Nov 06 '22 15:11

Hans Passant