If I am right, on Linux (in C/C++, gcc/g++
), one can read data from a regular file using read(2)
or mmap(2)
syscalls.
Two questions. Do read
syscall use mmap
internally? When is first faster than the second and vice versa?
If you're reading the file sequentially, my default choice would be to repeatedly read
into a largish buffer.
If you're accessing small bits of data scattered around a large file, the choice is less clear, but mmap
could lead to more readable code (since you could code things up as if the file were already in memory). Which would give better performance in this case is hard to tell a priori.
If you're writing performance-critical code, then the only way to ascertain performance is by benchmarking/profiling actual code.
General rule of thumb:
if you are reading a file sequentially from start to end you can use read() without performance hit.
if you are reading a file with random access, mmap() will result in better performance than a comparative seek()/read() combination.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With