I have an application that sequentially reads data from a file. Some is read directly from a pointer to the mmap
ed file and other parts are memcpy
ed from the file to another buffer. I noticed poor performance when doing a large memcpy
of all the memory that I needed (1MB blocks) and better performance when doing a lot of smaller memcpy
calls (In my tests, I used 4KB, the page size, which took 1/3 of the time to run.) I believe that the issue is a very large number of major page faults when using a large memcpy
.
I've tried various tuning parameters (MAP_POPUATE
, MADV_WILLNEED
, MADV_SEQUENTIAL
) without any noticeable improvement.
I'm not sure why many small memcpy
calls should be faster; it seems counter-intuitive. Is there any way to improve this?
Results and test code follow.
Running on CentOS 7 (linux 3.10.0), default compiler (gcc 4.8.5), reading 29GB file from a RAID array of regular disks.
Running with /usr/bin/time -v
:
4KB memcpy
:
User time (seconds): 5.43
System time (seconds): 10.18
Percent of CPU this job got: 75%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:20.59
Major (requiring I/O) page faults: 4607
Minor (reclaiming a frame) page faults: 7603470
Voluntary context switches: 61840
Involuntary context switches: 59
1MB memcpy
:
User time (seconds): 6.75
System time (seconds): 8.39
Percent of CPU this job got: 23%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:03.71
Major (requiring I/O) page faults: 302965
Minor (reclaiming a frame) page faults: 7305366
Voluntary context switches: 302975
Involuntary context switches: 96
MADV_WILLNEED
did not seem to have much impact on the 1MB copy result.
MADV_SEQUENTIAL
slowed down the 1MB copy result by so much, I didn't wait for it to finish (at least 7 minutes).
MAP_POPULATE
slowed the 1MB copy result by about 15 seconds.
Simplified code used for the test:
#include <algorithm>
#include <iostream>
#include <stdexcept>
#include <fcntl.h>
#include <stdint.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
int
main(int argc, char *argv[])
{
try {
char *filename = argv[1];
int fd = open(filename, O_RDONLY);
if (fd == -1) {
throw std::runtime_error("Failed open()");
}
off_t file_length = lseek(fd, 0, SEEK_END);
if (file_length == (off_t)-1) {
throw std::runtime_error("Failed lseek()");
}
int mmap_flags = MAP_PRIVATE;
#ifdef WITH_MAP_POPULATE
mmap_flags |= MAP_POPULATE; // Small performance degredation if enabled
#endif
void *map = mmap(NULL, file_length, PROT_READ, mmap_flags, fd, 0);
if (map == MAP_FAILED) {
throw std::runtime_error("Failed mmap()");
}
#ifdef WITH_MADV_WILLNEED
madvise(map, file_length, MADV_WILLNEED); // No difference in performance if enabled
#endif
#ifdef WITH_MADV_SEQUENTIAL
madvise(map, file_length, MADV_SEQUENTIAL); // Massive performance degredation if enabled
#endif
const uint8_t *file_map_i = static_cast<const uint8_t *>(map);
const uint8_t *file_map_end = file_map_i + file_length;
size_t memcpy_size = MEMCPY_SIZE;
uint8_t *buffer = new uint8_t[memcpy_size];
while (file_map_i != file_map_end) {
size_t this_memcpy_size = std::min(memcpy_size, static_cast<std::size_t>(file_map_end - file_map_i));
memcpy(buffer, file_map_i, this_memcpy_size);
file_map_i += this_memcpy_size;
}
}
catch (const std::exception &e) {
std::cerr << "Caught exception: " << e.what() << std::endl;
}
return 0;
}
Read uses the standard file descriptor access to files while mmap transparently maps files to locations in the process's memory. Most operating systems also use mmap every time a program gets loaded into memory for execution. Even though it is important and often used, mmap can be slow and inconsistent in its timing.
Accessing memory mapped files is faster than using direct read and write operations for two reasons. Firstly, a system call is orders of magnitude slower than a simple change to a program's local memory.
memmove() is similar to memcpy() as it also copies data from a source to destination.
Using the new memcpy function takes it to 37.5 sec. So the function is better but using structs kills the program. I will look for a way to use the xmmintrin commands without structs to see if anything changes and get back.
If the underlying file and disk systems aren't fast enough, whether your use mmap()
or POSIX open()
/read()
or standard C fopen()
/fread()
or C++ iostream
won't matter much at all.
If performance really matters and the underlying file and disk system(s) are fast enough, though, mmap()
is probably the worst possible way to read a file sequentially. The creation of mapped pages is a relatively expensive operation, and since each byte of data is read only once that cost per actual access can be extreme. Using mmap()
can also increase memory pressure on your system. You can explicitly munmap()
pages after you read them, but then your processing can stall while the mappings are torn down.
Using direct IO will probably be the fastest, especially for large files as there's not a massive number of page faults involved. Direct IO bypasses the page cache, which is a good thing for data read only once. Caching data read only once - never to be reread - is not only useless but potentially counterproductive as CPU cycles get used to evict useful data from the page cache.
Example (headers and error checking omitted for clarity):
int main( int argc, char **argv )
{
// vary this to find optimal size
// (must be a multiple of page size)
size_t copy_size = 1024UL * 1024UL;
// get a page-aligned buffer
char *buffer;
::posix_memalign( &buffer, ( size_t ) ( 4UL * 1024UL ), copy_size );
// make sure the entire buffer's virtual-to-physical mappings
// are actually done (can actually matter with large buffers and
// extremely fast IO systems)
::memset( buffer, 0, copy_size );
fd = ::open( argv[ 1 ], O_RDONLY | O_DIRECT );
for ( ;; )
{
ssize_t bytes_read = ::read( fd, buffer, copy_size );
if ( bytes_read <= 0 )
{
break;
}
}
return( 0 );
}
Some caveats exist when using direct IO on Linux. File system support can be spotty, and implementations of direct IO can be finicky. You probably have to use a page-aligned buffer to read data in, and you may not be able to read the very last page of the file if it's not a full page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With