I don't have much experience with memory mapped i/o, but after using them for the first time I'm stunned at how fast they are. In my performance tests, I'm seeing that reading from memory mapped files is 30X faster than reading through regular c++ stdio.
My test data is a 3GB binary file, it contains 20 large double precision floating point arrays. The way my test program is structured, I call an external module's read method, which uses memory mapped i/o behind the scenes. Every time I call the read method, this external module returns a pointer and a size of the data that the pointer points to. Upon returning from this method, I call memcpy to copy the contents of the returned buffer into another array. Since I'm doing a memcpy to copy data from the memory mapped file, I expected the memory mapped reads to be not considerably faster than normal stdio, but I'm astonished that it's 30X faster.
Why is reading from a memory mapped file so fast?
PS: I use a Windows machine. I benchmarked my i/o speeds and my machine's max disk transfer rate is around 90 MiB/s
Accessing files via memory map is faster than using I/O functions such as fread and fwrite . Data are read and written using the virtual memory capabilities that are built in to the operating system rather than having to allocate, copy into, and then deallocate data buffers owned by the process.
Performance: Memory mapped writing is often fast as no stream/file buffers are used. OS does the actual file writing, usually in blocks of several kilo bytes at once. One downside would be, unless you're writing sequentially there could be page faults slowing down your program.
Using wide vector instructions for data copying effectively utilizes the memory bandwidth, and combined with CPU pre-fetching makes mmap really really fast.
A memory-mapped file contains the contents of a file in virtual memory. This mapping between a file and memory space enables an application, including multiple processes, to modify the file by reading and writing directly to the memory.
The OS kernel routines for IO, like read or write calls, are still just functions. Those functions are written to copy data to/from userspace buffer to a kernel space structure, and then to a device. When you consider that there is a user buffer, a IO library buffer (stdio buf for example), a kernel buffer, then a file, the data may potentially go through 3 copies to get between your program and the disk. The IO routines also have to be robust, and lastly, the sys calls themselves impose a latency (trapping to kernel, context switch, waking process up again).
When you memory map a file, you are skipping right through much of that, eliminating buffer copies. By effectively treating the file like a big virtual array, you enable random access without going through the syscall overhead, so you lower the latency per IO, and if the original code is inefficient (many small random IO calls) then the overhead is reduced even more drastically.
The abstraction of a virtual memory, multiprocessing OS has a price, and this is it.
You can, however, improve IO in some cases by disabling buffering in cases when you know it will hurt performance, such as large contiguous writes, but beyond that, you really cant improve on the performance of memory mapped IO without eliminating the OS altogether.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With