Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does mmap improve file reading speed?

Assuming the address space can cover the file, it appears to me that mmap simply allocates a chunk of memory as large as the file about to be read, and creates a 1-to-1 relation between their corresponding blocks. However, why doing so speeds up file reading? It seems that in order to actually get the contents of the file, you still have to go to the disk, and read all the bytes on it.

What difference does it make, compared to malloc'ing the same size of memory and manually read the entire file into the malloc'ed area?

like image 447
OneZero Avatar asked May 11 '16 20:05

OneZero


2 Answers

mmap works differently. It is anticipatory and adapts to the program's access pattern. Also, specific policies can be set via madvise to further fine tune usage.

For a much more thorough discussion of how mmap works in a demand paging environment, see my answer here: Which segments are affected by a copy-on-write? as it also talks about use of mmap

mmap is the lifeblood of program execution via execve et. al. So, you can bet it's fast. As a side note, it is ironic that malloc actually uses anonymous mmap as well.

But, for discussion here, in particular, note the "backing store" (i.e. paging disk) for a file with mmap vs. doing malloc and read(2)

With mmap, the backing store for the memory area is the file itself. The area maps pages directly to the kernel's filesystem buffer pages [they've been unified for a long time]. So, no [wasteful] copy from the kernel filesystem buffer pages to application pages is required as is done for read(2).

When you do malloc/read, you still have the above pages, but, now the malloc'ed area has a backing store on the paging/swap disk. So, twice as many page buffers as with mmap. As I mentioned, data must be copied into the area when a read is done.

Also, doing a large read is sub-optimal in terms of performance. The recommended size is around 64 KB in a chunk [filesystem dependent].

When you do the large read, your program can't start until it completes. If the size of the file is larger than physical memory, the system will read into your malloc area, and will wastefully start flushing the earlier pages to the paging disk to make room for the ones near the end of the file until the entire file is read in.

In other words, the application is waiting on [and doing nothing] while this large preread occurs. For [say] a 60 GB file, the startup time will be noticeable.

If your file is truly large enough, you will even run out of space on the paging disk (i.e. malloc returned NULL).

For mmap, there are no such problems. When you map a file, you can start to use it immediately. It will "fault in" on demand directly from the area's backing store [which, once again, is the file in the filesystem]. And, if you've got [say] 1 TB file, mmap handles that just fine.

Also, you can control the mapping policy via madvise(2) and posix_madvise(2) on a page-by-page basis or any page range including whole file. The madvise syscall is relatively lightweight so it's fine to use it lot. It's a hint, but doesn't do I/O that will delay the application. If I/O is started to preread for the hint, it's done by the kernel as a background activity.

You can even tell the system that a given page will be needed soon [and the system takes this as a hint to prefetch it] or you can tell the system that the page is no longer needed [and the system will release the page buffer memory].

You can say things like "sequential access" for the entire file, which means the system will know to do the preread automatically, as well as the release of the pages that are no longer needed (i.e. if you're currently accessing page N, then the system releases any page before N-k)

When you do a read(2), there is no way to tell the system that the given kernel FS page buffers are no longer needed. They will linger around, until physical RAM fills up [or a given limit is exceeded] and this adds pressure to the entire memory system.

In practice, using read, I've seen the amount of memory used for FS buffers remain high long after an application has moved on to a different part of a file or a different file altogether. In fact, I've seen a single I/O intensive application use so many buffers that it caused unrelated [idle] processes to have their pages stolen and flushed to the paging disk. When I stopped the I/O application, it took a number of minutes for firefox to page itself back in and become responsive again.

I did some extensive benchmarks for regular read vs mmap. From them, mmap can improve speed for certain applications.

See my answer here: read line by line in the most efficient way *platform specific*

Before I did this, I was skeptical of the mmap benefits, but the benchmarks show that mmap is a winner.

Also, if you're doing read(2) (for speed) vs. fgets, you may get bogged down by the buffer shifting you have to do if a given line spans a read buffer boundary (i.e. the last 50 chars of your buffer have the first 50 bytes of an 80 char line).

Note in the comments within this linked page, that there is another link to pastebin to a later version of my benchmark program and results that was too large to post on the above SO answer that benchmarks and compares various madvise options

like image 181
Craig Estey Avatar answered Oct 24 '22 17:10

Craig Estey


I was curious about this so I tried benchmarking whole-file reads for files of sizes 1, 2, 4, 8, etc., once with mmap (M) and once with read (R) (theoretically one call with the fstat-ed size, but it would retry if that call returned a partial result). After the reading/mmaping, a byte of each mmaped/read page was accessed in an non-optimizable fashion.

Here's my results:

Size   M(µs)   R(µs)
1      9.5     4.2
2      10.8    4.5
4      8.4     3.8
8      8.6     3.8
16     7.3     4
32     7.8     3.5
64     8.3     3.9
128    9.2     4.6
256    8.6     4.7
512    10.6    5.1
1.0Ki  9.8     4.7
2.0Ki  10.1    5.4
4.0Ki  10.5    5.6
8.0Ki  10.4    6.9
16Ki   9.9     10
32Ki   14.4    12.8
64Ki   16.1    23.7
128Ki  28.1    41.1
256Ki  34.5    82.4
512Ki  57.9    154.6
1.0Mi  103.5   325.8
2.0Mi  188.5   919.8
4.0Mi  396.3   1963.2
8.0Mi  798.8   3885
16Mi   1611.4  7660.2
32Mi   3207.4  23040.2
64Mi   6712.1  84491.9

It appears read is about twice as fast up to about 16Ki. From then on, mmap starts winning big time (for 64MiB files by a factor of 12).

(Tested on Linux with 3.19 on my laptop, 10^4 repeated reads to the same file.)

like image 22
PSkocik Avatar answered Oct 24 '22 19:10

PSkocik