Improving mmap memcpy file read performance

Tags:

I have an application that sequentially reads data from a file. Some is read directly from a pointer to the mmaped file and other parts are memcpyed from the file to another buffer. I noticed poor performance when doing a large memcpy of all the memory that I needed (1MB blocks) and better performance when doing a lot of smaller memcpy calls (In my tests, I used 4KB, the page size, which took 1/3 of the time to run.) I believe that the issue is a very large number of major page faults when using a large memcpy.

I've tried various tuning parameters (MAP_POPUATE, MADV_WILLNEED, MADV_SEQUENTIAL) without any noticeable improvement.

I'm not sure why many small memcpy calls should be faster; it seems counter-intuitive. Is there any way to improve this?

Results and test code follow.

Running on CentOS 7 (linux 3.10.0), default compiler (gcc 4.8.5), reading 29GB file from a RAID array of regular disks.

Running with /usr/bin/time -v:

4KB memcpy:

User time (seconds): 5.43
System time (seconds): 10.18
Percent of CPU this job got: 75%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:20.59
Major (requiring I/O) page faults: 4607
Minor (reclaiming a frame) page faults: 7603470
Voluntary context switches: 61840
Involuntary context switches: 59

1MB memcpy:

User time (seconds): 6.75
System time (seconds): 8.39
Percent of CPU this job got: 23%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:03.71
Major (requiring I/O) page faults: 302965
Minor (reclaiming a frame) page faults: 7305366
Voluntary context switches: 302975
Involuntary context switches: 96

MADV_WILLNEED did not seem to have much impact on the 1MB copy result.

MADV_SEQUENTIAL slowed down the 1MB copy result by so much, I didn't wait for it to finish (at least 7 minutes).

MAP_POPULATE slowed the 1MB copy result by about 15 seconds.

Simplified code used for the test:

#include <algorithm>
#include <iostream>
#include <stdexcept>

#include <fcntl.h>
#include <stdint.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

int
main(int argc, char *argv[])
{
  try {
    char *filename = argv[1];

    int fd = open(filename, O_RDONLY);
    if (fd == -1) {
      throw std::runtime_error("Failed open()");
    }

    off_t file_length = lseek(fd, 0, SEEK_END);
    if (file_length == (off_t)-1) {
      throw std::runtime_error("Failed lseek()");
    }

    int mmap_flags = MAP_PRIVATE;
#ifdef WITH_MAP_POPULATE
    mmap_flags |= MAP_POPULATE;  // Small performance degredation if enabled
#endif

    void *map = mmap(NULL, file_length, PROT_READ, mmap_flags, fd, 0);
    if (map == MAP_FAILED) {
      throw std::runtime_error("Failed mmap()");
    }

#ifdef WITH_MADV_WILLNEED
    madvise(map, file_length, MADV_WILLNEED);    // No difference in performance if enabled
#endif

#ifdef WITH_MADV_SEQUENTIAL
    madvise(map, file_length, MADV_SEQUENTIAL);  // Massive performance degredation if enabled
#endif

    const uint8_t *file_map_i = static_cast<const uint8_t *>(map);
    const uint8_t *file_map_end = file_map_i + file_length;

    size_t memcpy_size = MEMCPY_SIZE;

    uint8_t *buffer = new uint8_t[memcpy_size];

    while (file_map_i != file_map_end) {
      size_t this_memcpy_size = std::min(memcpy_size, static_cast<std::size_t>(file_map_end - file_map_i));
      memcpy(buffer, file_map_i, this_memcpy_size);
      file_map_i += this_memcpy_size;
    }
  }
  catch (const std::exception &e) {
    std::cerr << "Caught exception: " << e.what() << std::endl;
  }

  return 0;
}

941

asked Oct 16 '18 23:10

Alex

1 Answers

If the underlying file and disk systems aren't fast enough, whether your use mmap() or POSIX open()/read() or standard C fopen()/fread() or C++ iostream won't matter much at all.

If performance really matters and the underlying file and disk system(s) are fast enough, though, mmap() is probably the worst possible way to read a file sequentially. The creation of mapped pages is a relatively expensive operation, and since each byte of data is read only once that cost per actual access can be extreme. Using mmap() can also increase memory pressure on your system. You can explicitly munmap() pages after you read them, but then your processing can stall while the mappings are torn down.

Using direct IO will probably be the fastest, especially for large files as there's not a massive number of page faults involved. Direct IO bypasses the page cache, which is a good thing for data read only once. Caching data read only once - never to be reread - is not only useless but potentially counterproductive as CPU cycles get used to evict useful data from the page cache.

Example (headers and error checking omitted for clarity):

int main( int argc, char **argv )
{
    // vary this to find optimal size
    // (must be a multiple of page size)
    size_t copy_size = 1024UL * 1024UL;

    // get a page-aligned buffer
    char *buffer;
    ::posix_memalign( &buffer, ( size_t ) ( 4UL * 1024UL ), copy_size );

    // make sure the entire buffer's virtual-to-physical mappings
    // are actually done (can actually matter with large buffers and
    // extremely fast IO systems)
    ::memset( buffer, 0, copy_size );

    fd = ::open( argv[ 1 ], O_RDONLY | O_DIRECT );

    for ( ;; )
    {
        ssize_t bytes_read = ::read( fd, buffer, copy_size );
        if ( bytes_read <= 0 )
        {
            break;
        }
    }

    return( 0 );
}

Some caveats exist when using direct IO on Linux. File system support can be spotty, and implementations of direct IO can be finicky. You probably have to use a page-aligned buffer to read data in, and you may not be able to read the very last page of the file if it's not a full page.

163

answered Oct 01 '22 19:10

Andrew Henle

Related questions
                            
                                How to develop small software or application? [closed]
                            
                                Boost::asio, Shared Memory and Interprocess Communication
                            
                                How can I wrap a c++ class in php extension?
                            
                                alias template substitution and deduction failure with gcc
                            
                                How much does the C standard library extensibility affect C++ programs?
                            
                                Why c++ standard support function strftime but not strptime?
                            
                                packaging c++ program using boost libraries with cmake/cpack
                            
                                How to get AssImp to work properly?
                            
                                why does a conditional variable fix our power consumption?
                            
                                Is there any workaround to "reserve" a cache fraction?
                            
                                Fixed address is occupied in .NET
                            
                                Emscripten: emmake generating .js files
                            
                                static_assert on inline function gives error
                            
                                Accessing variable values within a macro
                            
                                Determining which overload was selected
                            
                                Qt5 QGeoPositionInfoSource::createDefaultSource() crashes on Android 5.0
                            
                                Finding out whether static initialization is over
                            
                                On the implementation of std::string moves
                            
                                Why std::uniform_real_distribution::max() returns the exclusive upper bound?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Improving mmap memcpy file read performance

Tags:

c++

performance

c

linux

mmap

Alex

People also ask

1 Answers

Andrew Henle

Recent Activity

Donate For Us