Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is sequentially reading a large file row by row with mmap and madvise sequential slower than fgets?

Overview

I have a program bounded significantly by IO and am trying to speed it up. Using mmap seemed to be a good idea, but it actually degrades the performance relative to just using a series of fgets calls.

Some demo code

I've squeezed down demos to just the essentials, testing against an 800mb file with about 3.5million lines:

With fgets:

char buf[4096];
FILE * fp = fopen(argv[1], "r");

while(fgets(buf, 4096, fp) != 0) {
    // do stuff
}
fclose(fp);
return 0;

Runtime for 800mb file:

[juhani@xtest tests]$ time ./readfile /r/40/13479/14960 

real    0m25.614s
user    0m0.192s
sys 0m0.124s

The mmap version:

struct stat finfo;
int fh, len;
char * mem;
char * row, *end;
if(stat(argv[1], &finfo) == -1) return 0;
if((fh = open(argv[1], O_RDONLY)) == -1) return 0;

mem = (char*)mmap(NULL, finfo.st_size, PROT_READ, MAP_SHARED, fh, 0);
if(mem == (char*)-1) return 0;
madvise(mem, finfo.st_size, POSIX_MADV_SEQUENTIAL);
row = mem;
while((end = strchr(row, '\n')) != 0) {
    // do stuff
    row = end + 1;
}
munmap(mem, finfo.st_size);
close(fh);

Runtime varies quite a bit, but never faster than fgets:

[juhani@xtest tests]$ time ./readfile_map /r/40/13479/14960

real    0m28.891s
user    0m0.252s
sys 0m0.732s
[juhani@xtest tests]$ time ./readfile_map /r/40/13479/14960

real    0m42.605s
user    0m0.144s
sys 0m0.472s

Other notes

  • Watching the process run in top, the memmapped version generated a few thousand page faults along the way.
  • CPU and memory usage are both very low for the fgets version.

Questions

  • Why is this the case? Is it just because the buffered file access implemented by fopen/fgets is better than the aggressive prefetching that mmap with madvise POSIX_MADV_SEQUENTIAL?
  • Is there an alternative method of possibly making this faster(Other than on-the-fly compression/decompression to shift IO load to the processor)? Looking at the runtime of 'wc -l' on the same file, I'm guessing this might not be the case.
like image 407
juhanic Avatar asked May 19 '11 08:05

juhanic


1 Answers

POSIX_MADV_SEQUENTIAL is only a hint to the system and may be completely ignored by a particular POSIX implementation.

The difference between your two solutions is that mmap requires the file to be mapped into the virtual address space entierly, whereas fgets has the IO entirely done in kernel space and just copies the pages into a buffer that doesn't change.

This also has more potential for overlap, since the IO is done by some kernel thread.

You could perhaps increase the perceived performance of the mmap implementation by having one (or more) independent threads reading the first byte of each page. This (or these) thread then would have all the page faults and the time your application thread would come at a particular page it would already be loaded.

like image 132
Jens Gustedt Avatar answered Oct 03 '22 19:10

Jens Gustedt