Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast file reading

Tags:

c++

c

linux

gcc

If I am right, on Linux (in C/C++, gcc/g++), one can read data from a regular file using read(2) or mmap(2) syscalls.

Two questions. Do read syscall use mmap internally? When is first faster than the second and vice versa?

like image 275
Cartesius00 Avatar asked Jan 18 '23 20:01

Cartesius00


2 Answers

If you're reading the file sequentially, my default choice would be to repeatedly read into a largish buffer.

If you're accessing small bits of data scattered around a large file, the choice is less clear, but mmap could lead to more readable code (since you could code things up as if the file were already in memory). Which would give better performance in this case is hard to tell a priori.

If you're writing performance-critical code, then the only way to ascertain performance is by benchmarking/profiling actual code.

like image 173
NPE Avatar answered Jan 20 '23 08:01

NPE


General rule of thumb:

  • if you are reading a file sequentially from start to end you can use read() without performance hit.

  • if you are reading a file with random access, mmap() will result in better performance than a comparative seek()/read() combination.

like image 37
gravitron Avatar answered Jan 20 '23 09:01

gravitron