Why is reading from a memory mapped file so fast?

Tags:

I don't have much experience with memory mapped i/o, but after using them for the first time I'm stunned at how fast they are. In my performance tests, I'm seeing that reading from memory mapped files is 30X faster than reading through regular c++ stdio.

My test data is a 3GB binary file, it contains 20 large double precision floating point arrays. The way my test program is structured, I call an external module's read method, which uses memory mapped i/o behind the scenes. Every time I call the read method, this external module returns a pointer and a size of the data that the pointer points to. Upon returning from this method, I call memcpy to copy the contents of the returned buffer into another array. Since I'm doing a memcpy to copy data from the memory mapped file, I expected the memory mapped reads to be not considerably faster than normal stdio, but I'm astonished that it's 30X faster.

Why is reading from a memory mapped file so fast?

PS: I use a Windows machine. I benchmarked my i/o speeds and my machine's max disk transfer rate is around 90 MiB/s

384

asked Oct 19 '14 22:10

DigitalEye

1 Answers

The OS kernel routines for IO, like read or write calls, are still just functions. Those functions are written to copy data to/from userspace buffer to a kernel space structure, and then to a device. When you consider that there is a user buffer, a IO library buffer (stdio buf for example), a kernel buffer, then a file, the data may potentially go through 3 copies to get between your program and the disk. The IO routines also have to be robust, and lastly, the sys calls themselves impose a latency (trapping to kernel, context switch, waking process up again).

When you memory map a file, you are skipping right through much of that, eliminating buffer copies. By effectively treating the file like a big virtual array, you enable random access without going through the syscall overhead, so you lower the latency per IO, and if the original code is inefficient (many small random IO calls) then the overhead is reduced even more drastically.

The abstraction of a virtual memory, multiprocessing OS has a price, and this is it.

You can, however, improve IO in some cases by disabling buffering in cases when you know it will hurt performance, such as large contiguous writes, but beyond that, you really cant improve on the performance of memory mapped IO without eliminating the OS altogether.

102

answered Sep 21 '22 09:09

codenheim

Related questions
                            
                                The implementation of random_device in VS2010?
                            
                                Embedded programming ... very beginning [closed]
                            
                                If I specify a default value for an argument of type "std::string &" in C++, could that cause a memory leak?
                            
                                What is the rationale in allowing `?` to be escaped?
                            
                                Does C++11 guarantee that "int a[8] = {};" is semantically equivalent to "int a[8]{};"?
                            
                                Additional arguments for custom deleter of shared_ptr
                            
                                How to determine if a file is contained by path with Boost Filesystem Library v3?
                            
                                "ambiguous overload for 'operator[]'" if conversion operator to int exist
                            
                                Why is double not allowed as a non-type template parameter? [duplicate]
                            
                                Find the nth element satisfying a condition?
                            
                                Strings in static memory instances count
                            
                                ___sincos_stret undefined symbol when linking
                            
                                Android NDK, keeping live C++ objects
                            
                                Why std::shared_ptr calls destructors from base and derived classes, where delete calls only destructor from base class? [duplicate]
                            
                                Member not zeroed, a clang++ bug?
                            
                                If I need polymorphism should I use raw pointers instead of unique_ptr?
                            
                                Initializer list in a range for loop
                            
                                Existing API for NLP in C++?
                            
                                Why does adding a '0' to an int digit allow conversion to a char?
                            
                                Is it possible to allow one std::function type accept lambdas with different signatures

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is reading from a memory mapped file so fast?

Tags:

c++

windows

memory-mapped-files

DigitalEye

People also ask

1 Answers

codenheim

Recent Activity

Donate For Us