Assuming the address space can cover the file, it appears to me that mmap simply allocates a chunk of memory as large as the file about to be read, and creates a 1-to-1 relation between their corresponding blocks. However, why doing so speeds up file reading? It seems that in order to actually get the contents of the file, you still have to go to the disk, and read all the bytes on it.
What difference does it make, compared to malloc'ing the same size of memory and manually read the entire file into the malloc'ed area?
mmap
works differently. It is anticipatory and adapts to the program's access pattern. Also, specific policies can be set via madvise
to further fine tune usage.
For a much more thorough discussion of how mmap
works in a demand paging environment, see my answer here: Which segments are affected by a copy-on-write? as it also talks about use of mmap
mmap
is the lifeblood of program execution via execve
et. al. So, you can bet it's fast. As a side note, it is ironic that malloc
actually uses anonymous mmap
as well.
But, for discussion here, in particular, note the "backing store" (i.e. paging disk) for a file with mmap
vs. doing malloc
and read(2)
With mmap
, the backing store for the memory area is the file itself. The area maps pages directly to the kernel's filesystem buffer pages [they've been unified for a long time]. So, no [wasteful] copy from the kernel filesystem buffer pages to application pages is required as is done for read(2)
.
When you do malloc/read
, you still have the above pages, but, now the malloc'ed area has a backing store on the paging/swap disk. So, twice as many page buffers as with mmap
. As I mentioned, data must be copied into the area when a read is done.
Also, doing a large read is sub-optimal in terms of performance. The recommended size is around 64 KB in a chunk [filesystem dependent].
When you do the large read, your program can't start until it completes. If the size of the file is larger than physical memory, the system will read into your malloc area, and will wastefully start flushing the earlier pages to the paging disk to make room for the ones near the end of the file until the entire file is read in.
In other words, the application is waiting on [and doing nothing] while this large preread occurs. For [say] a 60 GB file, the startup time will be noticeable.
If your file is truly large enough, you will even run out of space on the paging disk (i.e. malloc
returned NULL).
For mmap
, there are no such problems. When you map a file, you can start to use it immediately. It will "fault in" on demand directly from the area's backing store [which, once again, is the file in the filesystem]. And, if you've got [say] 1 TB file, mmap
handles that just fine.
Also, you can control the mapping policy via madvise(2)
and posix_madvise(2)
on a page-by-page basis or any page range including whole file. The madvise
syscall is relatively lightweight so it's fine to use it lot. It's a hint, but doesn't do I/O that will delay the application. If I/O is started to preread for the hint, it's done by the kernel as a background activity.
You can even tell the system that a given page will be needed soon [and the system takes this as a hint to prefetch it] or you can tell the system that the page is no longer needed [and the system will release the page buffer memory].
You can say things like "sequential access" for the entire file, which means the system will know to do the preread automatically, as well as the release of the pages that are no longer needed (i.e. if you're currently accessing page N, then the system releases any page before N-k)
When you do a read(2)
, there is no way to tell the system that the given kernel FS page buffers are no longer needed. They will linger around, until physical RAM fills up [or a given limit is exceeded] and this adds pressure to the entire memory system.
In practice, using read
, I've seen the amount of memory used for FS buffers remain high long after an application has moved on to a different part of a file or a different file altogether. In fact, I've seen a single I/O intensive application use so many buffers that it caused unrelated [idle] processes to have their pages stolen and flushed to the paging disk. When I stopped the I/O application, it took a number of minutes for firefox to page itself back in and become responsive again.
I did some extensive benchmarks for regular read vs mmap. From them, mmap can improve speed for certain applications.
See my answer here: read line by line in the most efficient way *platform specific*
Before I did this, I was skeptical of the mmap benefits, but the benchmarks show that mmap is a winner.
Also, if you're doing read(2)
(for speed) vs. fgets
, you may get bogged down by the buffer shifting you have to do if a given line spans a read buffer boundary (i.e. the last 50 chars of your buffer have the first 50 bytes of an 80 char line).
Note in the comments within this linked page, that there is another link to pastebin to a later version of my benchmark program and results that was too large to post on the above SO answer that benchmarks and compares various madvise
options
I was curious about this so I tried benchmarking whole-file reads for files of
sizes 1, 2, 4, 8, etc., once with mmap
(M) and once with read
(R) (theoretically one call with the fstat-ed size, but it would retry if that call returned a partial result). After the reading/mmaping, a byte of each mmaped/read page was accessed in an non-optimizable fashion.
Here's my results:
Size M(µs) R(µs)
1 9.5 4.2
2 10.8 4.5
4 8.4 3.8
8 8.6 3.8
16 7.3 4
32 7.8 3.5
64 8.3 3.9
128 9.2 4.6
256 8.6 4.7
512 10.6 5.1
1.0Ki 9.8 4.7
2.0Ki 10.1 5.4
4.0Ki 10.5 5.6
8.0Ki 10.4 6.9
16Ki 9.9 10
32Ki 14.4 12.8
64Ki 16.1 23.7
128Ki 28.1 41.1
256Ki 34.5 82.4
512Ki 57.9 154.6
1.0Mi 103.5 325.8
2.0Mi 188.5 919.8
4.0Mi 396.3 1963.2
8.0Mi 798.8 3885
16Mi 1611.4 7660.2
32Mi 3207.4 23040.2
64Mi 6712.1 84491.9
It appears read
is about twice as fast up to about 16Ki
. From then on, mmap
starts winning big time (for 64MiB
files by a factor of 12).
(Tested on Linux with 3.19 on my laptop, 10^4 repeated reads to the same file.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With