Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why mmap() (Memory Mapped File) is faster than read()

I was working on sth about MappedByteBuffer of Java NIO recently. I've read some posts about it and all of them mention that "mmap() is faster than read()"

In my conclusion:

  1. I treat MappedByteBuffer == Memory Mapped File == mmap()

  2. read() has to read data through : disk file -> kernel -> application, so it has context switch and buffer copying

  3. They all said mmap() has less copying or syscall than read(), but as I know it also need to read from disk file the first time you access the file data. So the first time it read : virtual address -> memory -> page fault -> disk file -> kernel -> memory. Except you can access it randomly, the last 3 steps (disk file -> kernel -> memory) is exactly the same as read(), so how mmap() could be less copying or syscall than read() ?

  4. what's the relationship between mmap() and swap file, Is that the os will put the least used file data of memory into swap (LRU) ? So when the second time you access these data, OS retrieves them from swap but not disk file(no need to copy to kernel buffer), that's why mmap() has less copying and syscall ?

  5. In java, MappedByteBuffer is allocated out of heap (it's a direct buffer). So when you read from MappedByteBuffer, does it mean it need one more extra memory copy from outside the java heap into java heap?

Could anyone answer my questions ? Thanks :)

like image 495
Alexis Avatar asked Mar 18 '15 03:03

Alexis


People also ask

Why Memory mapping is faster?

Accessing files via memory map is faster than using I/O functions such as fread and fwrite . Data are read and written using the virtual memory capabilities that are built in to the operating system rather than having to allocate, copy into, and then deallocate data buffers owned by the process.

What are the advantages of using mmap ()?

Advantages of mmap( ) Aside from any potential page faults, reading from and writing to a memory-mapped file does not incur any system call or context switch overhead. It is as simple as accessing memory. When multiple processes map the same object into memory, the data is shared among all the processes.

Is malloc or mmap faster?

Almost always, memory is much faster than disk, and malloc is not what's costing time. The mmap code is faster because for your program, mmap has resulted in either less disk access, or more efficient disk access, than whatever reads and writes you compared against.

How slow is mmap?

In contrast, mmap is very inconsistent, with a standard deviation of a little over 47%. It is worth noting that the minimum of mmap, Page 4 5.488 seconds is comparable to the average time of pread, 4.698 seconds.

Why is read () faster than MMAP () in Linux?

If a page of the mapped file is not in memory, access will generate a fault and require kernel to load the page to memory. Reading a large block with read () can be faster than mmap () in such cases, if mmap () would generate significant number of faults to read the file.

What is the difference between mmap and memory-mapped?

As before, the performance differs between the two approaches: The memory-mapped approach is still an order of magnitude faster. A memory-mapped file is part string and part file, so mmap also allows you to perform common file operations like seek (), tell (), and readline ().

What is the use of mmap?

Using mmap () maps the file to process' address space, so the process can address the file directly and no copies are required. There is also no system call overhead when accessing memory mapped file after the initial call if the file is loaded to memory at initial mmap ().

Is there a way to make mmap () faster?

But the only way mmap () can be significantly faster is if you read the same data more than once and the data you read doesn't get paged out between reads because of memory pressure. Process accesses memory for the first time, causing a page fault - expensive (and may need to be repeated if paged out)


Video Answer


1 Answers

1: Yes, that is essentially what a MappedByteBuffer is.

2: "disk file -> kernel" doesn't necessarily involve copying.

3: With a memory-mapped file, once the kernel has read the file into its cache, it can simply map that part of the cache into your process - instead of having to copy the data from the cache into a location your process specifies.

4: If the kernel decides to swap out a page from a memory-mapped file, it will not write the page to the page file; it will write the page to the original file (the one it's mapped from) before discarding the page. Writing it to the page file would be unnecessary and waste page file space.

5: Yes. For example, if you call get(byte[]) then the data will be copied from the off-heap mapping into your array. Note that functions such as get(byte[]) need to copy data for any type of buffer - this is not specific to memory-mapped files.

like image 145
user253751 Avatar answered Nov 09 '22 13:11

user253751